James Gosling leaves Oracle

April 10, 2010 by karaznie · Leave a Comment 

This just hit me this morning: James Gosling leaves Oracle. Although not so unexpected move, frankly, it’s quite significant. With all uncertainty about Oracle commitment to java (especially openness, agility and simply, kinda of good, balanced attitude) this might boil java trouble water more. I love Gosling’s attitude and his perception of simplicity. Anyway, hope Gosling continue his excellent work, and looking forward and expect some great, new bits from father of java! Bookmarked and subscribed James new blog feeds… Thanks for all the fish, James :) . And… what’s Your opinion?

del.icio.us Digg DZone Facebook Google Reader StumbleUpon Technorati

Why Asynchronous Servlets matter, part I

January 25, 2010 by karaznie · 1 Comment 

Most of us, in java world, are well aware of the new spec bundle under the JEE 6 moniker released last December. Along with well recognized and discussed features like JPA 2.0, EJB 3.1 or CDI, JCP sneaked really innovative bit – JSR 315, aka Java Servlet 3.0 Specification. To be honest – historically Servlets API wasn’t the most exciting thing on the planet. Sure it’s important, it’s solid pillar of JEE, but – let’s face it – most of us probably not only can’t spot any difference between subsequent Serlvet specs, but probably, haven’t even scrapped the Servlet API surface directly for years, used to convenient abstractions introduced by tons of frameworks we are used to nowadays. Servlet API happened to be kind of Stable Layer of JEE.

This time things seems to be different, though. Real innovation comes from the very core layer of JSR-315 with notion of asynchronous servlets. Unfortunately as few times before, JCP put great marketing effort to make this piece as much esoteric as they could. At least from my perspective. Documentation is sparse at best, examples are hard to understand and We, the community, are not far better in this very matter. Admittedly there are pretty good examples and introductions around the web, but none of them draw big picture.

So this is my first instalment of trilogy. I want to cast some light on this topic, will try to put all puzzles together to easily grasp whole asynchronous servlet concept: what’s it, what’s not, what’s good, bad and the ugly. In the second part I’ll show some real life examples of asynchronous servlets (and clients), introduce higher level frameworks and techniques: cometd and Bayeux. In the last part I will show, what I’m particularly most interested in, how this whole idea fits into web services world. But…

First things first

Before we grasp the idea of asynchronous servlets, we have to understand basics: how HTTP servers actually work. I won’t dive into nuts and bolts here – I’ll briefly describe some nuances of HTTP protocol handling, and how modern HTTP servers work. Just to understand what’s so innovative in asynchronous HTTP processing. If You are interested in more in-depth analysis i highly recommend great article by Xinyu Liu.

Thread-per-connection

Nowadays most HTTP engines use some variation of so called thread-per-connection pattern. Basically, when HTTP connection arrives, HTTP server picks single thread, of out a pool, to handle it till it’s committed (meaning all response data are written and response output stream is closed), then thread is freed-up back to the pool.  This approach scales pretty well, but it has some drawbacks, though. One of significant aspects of HTTP 1.1 protocol is keep-alive option. Client and server may (and they usually do) negotiate to keep connection open and span multiple HTTP request and responses. In traditional approach this means that thread can potentially be allocated for significant amount of time, even if it’s just idle. Fortunately this problem might be mitigated with…

Asynchronous IO and Thread-per-request

With Web 2.0 and AJAX, web application comes with far richer user experience and, usually, needs hundreds of request to render and refresh its content. This raises performance bar significantly. Fortunately, most HTTP server vendors have undertaken tremendous amount of research, and they reach stunning numbers in the of concurrent connections they can handle simultaneously. With help of asynchronous IO it is no longer needed to maintain thread up until connection is closed. As long as request processing is finished, processing thread can be reused, while IO stuff can be handled asynchronously. This effectively means that even Client and server keep connection alive for significant amount of time, we don’t need any thread associated with it. Fortunately, this is done completely transparently from Servlet API perspective.

The need of asynchronous servlets

So why we actually need asynchronous servlets? Well, Web 2.0 redefined how web applications work, but what’s even more important, Web 2.0 redefined Web architecture as well in the terms of rich user experience, high interactivity and low latency messaging. Asynchronous servlets address low latency, asynchronous messaging aspect.  This aspect has quite broad use-cases, from real-time messaging services (like google wave) to enterprise level services, like trading, stocks etc.

Quote Service

To make this whole story more concrete I will illustrate above idea with simple Stock Quote Service and Stock Quote Widget. Stock Quote AJAX Widget will present real-time share price provided by Stock Quote Service. We will use Observer, or more broadly publish/subscribe pattern – so every Client would subscribe given channel and watch stock prices. Of course, we cannot guarantee how frequently we get information from our backing stock exchange system. It’s asynchronous. Since we don’t have real connection to real-time sock quote service, we will mock it with simple MockStockQuoteService:

package org.restfusion.jsr315.stockquote;

import java.util.Observable;
import java.util.Random;

public class MockStockQuoteService extends Observable implements StockQuote, Runnable {

public void run() {
setChanged();
notifyObservers();
}

public String getSymbol() {
return "GOOG";
}

public Double getValue() {
return Math.abs(new Random().nextDouble());
}
}

For sake of brevity, Observable holds simple stock quote data behind this simple interface:

package org.restfusion.jsr315.stockquote;

interface StockQuote {
public String getSymbol();
public Double getValue();
}

And for testing purposes it will just return some random share price for “GOOG”. Nothing unusual so far.

Our first, naive approach would be like this: write simple synchronous servlet and poll it every second from within Our widget using some flavour of AJAX toolkit. Our servlet would observe MockStockQuoteService and return information if something has changed, or not. This is actually pretty simple and common approach. Here’s how it’s gonna work:

It’s clear that only portion of responses would carry interesting information, and the rest would effectively just say ‘nothing changed’. As You can see only second pair (in red) of request/response carries some significant value. So far so good. Now lets imagine that one day someone find Our widget useful and put it on igoogle. We have to prepare to face with something of google scale, right? Let’s be modest, say 10000 clients will subscribe Our service. Now we have to handle (potentially) 10000 requests per second, ergo we have to scale Our server to handle 10000 threads. Pretty impressive number, isn’t it? What’s worse, most of those threads will do virtually nothing except saying: ‘nothing changed’. It’s where asynchronous servlets shine.

Long polling

The trouble with Our initial approach comes from the fact that we use synchronous approach to solve asynchronous problem. Stock quote service is asynchronous by its nature. We don’t ask Stock Exchange for stock price, they will usually give us that information. It would be nice to follow this approach on Servlets API – listen for events, rather than poll. Now our second approach would be Long Polling. Here is how it works:

Compare this with previous one. Can You spot the difference? Right! Now, every response carries some valuable information. Note that we actually establish HTTP connection and wait for response up until we have some interesting information available (in this case new stock price). We do poll still, but at much lower ratio, and are waiting for some valuable information. When information arrives we consume it, and poll once more. Potentially ad infinitum. Here is sample Stock Quote Widget:

<html>
<head>
<script type="text/javascript" src="jquery-1.3.2.min.js"> </script>
<script type="text/javascript">
$(document).ready(longPoll);

function longPoll() {
$.get("/jsr315/longPollingStock", {}, function ( data, status ) {
$("#quoteWidget").html(data);
longPoll();
} );
}
</script>
</head>
<body>
<div id="quoteWidget"></div>
</body>
</html>

jQuery crash course: longPoll function simply sends GET requests to Our servlet and waits for completion (new stock quote). Once it arrives, it updates our widget (quoteWidget div) and initiates next poll.

Now, finally, lets look how Our server side stuff support this approach.

@WebServlet(urlPatterns = "/longPollingStock", asyncSupported = true)
public class LongPollingStockServlet extends HttpServlet {

static MockStockQuoteService stockQuoteService = new MockStockQuoteService();
static ScheduledThreadPoolExecutor worker = (ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(1);

static {
worker.scheduleAtFixedRate(stockQuoteService, 0, 5, TimeUnit.SECONDS);
}

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
AsyncContext ctx = req.startAsync(req, resp);
stockQuoteService.addObserver(new QuoteObserver(ctx));
}
}

Here are two things to note:

asyncSupported = true – this flag instructs HTTP container that this servlet might support asynchronous processing.

AsyncContext ctx = req.startAsync(req, resp) – this is where we actually initiate asynchronous processing by calling HttpServletRequest#startAsync. As servlet documentation says:

Calling this method will cause committal of the associated response to be delayed until AsyncContext#complete is called on the returned AsyncContext, or the asynchronous operation has timed out.

And that’s the whole thing. We delay request  committal up until we process this request in the future, asynchronously. If we wouldn’t do above, Our HTTP request would be effectively committed right after doGet method is finished. This time we postpone request processing until Our  QuoteObserver gets notified by MockStockQuoteService asynchronously. At the same time we finish doGet and give processing thread back to the HTTP container. We register QuteObserver on MockStockQuoteEngine which would notify all its observer every five seconds, as You might deduced looking at  ScheduledThreadPoolExecutor. More on this later, now lest assume that’s just black box, it would notify us asynchronously. Just don’t look into that box right now :) .

Now, let’s look where actually something useful happens:

public class QuoteObserver implements Observer {

private AsyncContext ctx;

public QuoteObserver(AsyncContext ctx) {
this.ctx = ctx;
}

public void update(Observable observable, Object arg) {
StockQuote quote = (StockQuote) observable;
try {
ctx.getResponse().setContentType("text/html");
Writer writer = ctx.getResponse().getWriter();
writer.write("<span>Symbol: " + quote.getSymbol() + ", Price " + quote.getValue() +  "</span>");
observable.deleteObserver(this);
ctx.complete();
} catch (IOException e) { /* ignore */ }
}
}

Here we are doing pretty useful stuff. Once we get notified, we inform Our subscribers that we just got new stock prices. And effectively causing response committal calling ctx.complete(); (and we are removing this Observer from Observers pool, because we don’t know if client would choose to poll anymore). Note that we do this processing in completely different context.  doGet unwound probably few seconds before, and Our previous processing thread is serving next request, hopefully. Now…

Putting it all together

Finally, when You deploy this application and reach Your Stock Quote Widget You would see something like this (and the price part will be refreshed every five seconds).

Don’t look into the box

Lest step back a little and take a look at ScheduledThreadPoolExecutor. You may probably wonder: why actually use another thread just to handle request asynchronously while we said, that all that stuff is about limiting thread usage, right? Now, isn’t it counter-intuitive to spawn other Thread (or even Executor) just to handle request asynchronously? Well, that’s pretty valid question. It’s just for sake of brevity. In the next instalment of this series I will show how we can integrate this technique and available asynchronous options, like JMS and asynchronous methods (new in spring 3.0 and EJB 3.1). For now… just don’t look into the box.

To sum this part up: this approach seems to be more reasonable. It reduces significantly polling frequency, and request carry only important data. Though, it still has some important holes. First of all, subscribers are not durable. We can miss some information between subsequent polls. This is probably not the problem in Our, simple case, but it might be in plethora of other scenarios. Fortunately here comes…

HTTP streaming

Idea behind HTTP streaming is actually pretty simple. Client initiates connection and keeps it open possibly ad infinitum. Subsequent messages from servlet to client are sent using previously allocated communication channel. Conceptually we newer commit response. Here is how it works

This seems to be better approach that previous one, we eliminate polling, and once subscribed, we won’t miss communicates. So how Our servlet would look like? Exactly like before:

@WebServlet(urlPatterns = "/streamingQuoteService", asyncSupported=true)
public class StreamingStockServlet extends HttpServlet {

static MockStockQuoteService stockQuoteService = new MockStockQuoteService();
static ScheduledThreadPoolExecutor worker = (ScheduledThreadPoolExecutor) Executors.newScheduledThreadPool(1);

static {
worker.scheduleAtFixedRate(stockQuoteService, 0, 5, TimeUnit.SECONDS);
}

@Override
protected void doGet(HttpServletRequest req, HttpServletResponse resp) throws ServletException, IOException {
AsyncContext ctx = req.startAsync(req, resp);
stockQuoteService.addObserver(new StreamingQuoteObserver(ctx));
}
}

And here comes StreamingQuoteObserver

public class StreamingQuoteObserver implements Observer {

private AsyncContext ctx;

public StreamingQuoteObserver(AsyncContext ctx) {
this.ctx = ctx;
}

public void update(Observable observable, Object arg) {
StockQuote quote = (StockQuote) observable;
try {
ctx.getResponse().setContentType("text/html");
Writer writer = ctx.getResponse().getWriter();
writer.write("<span>Symbol: " + quote.getSymbol() + ", Price " + quote.getValue() +  "</span>");
writer.flush();
} catch (IOException e) { /* ignore */ }
}
}

Note that we keep connection open all the time (here is no ctx.complete()).  On client side we will get stream of events:

<span>Symbol: GOOG, Price 0.8843558760137272</span>
<span>Symbol: GOOG, Price 0.30434186388689377</span>
<span>Symbol: GOOG, Price 0.673062794397491</span>
...

Now it’s client’s responsibility to parse incoming stream of events and update Our Stock Quote Widget. Unfortunately there is no simple, reliable, standard way to do such a thing in modern browsers. Basically most browser limit partial response retrieval. Note that, in previous example, this function

$.get("/jsr315/longPollingStock", {}, function ( data, status ) {
$("#quoteWidget").html(data);
longPoll();
} )

gets called when response is committed. That’s why this technique is somewhat limited right now. Of course we can still use this technique with other clients, including java ones. Here is how this looks like with curl:

$ curl http://localhos:8080/jsr315/streamingQuoteService
<span>Symbol: GOOG, Price 0.8843558760137272</span>
<span>Symbol: GOOG, Price 0.30434186388689377</span>
<span>Symbol: GOOG, Price 0.673062794397491</span>
...

The Good, The Bad and the Ugly

Let’s try to sum all above this. Definitely asynchronous servlets spec opens whole new era and possibilities. Historically few HTTP Servlet container vendors built their own, incompatible implementations. Now it’s standardized, and that’s probably makes JEE far ahead of competition in this matter. That’s good.

The bad thing is this whole new servlet processing model comes at a cost

Here also comes the ugly bit. But it’s not necessarily associated with Servlet 3.0 spec, but rather with the architecture part.

You have probably noted before that HTTP streaming is rather hard to achieve

But also it’s somehow hard to proxy Your asynchronous application through intermediaries, which is really problematic since most companies I know front their application server through bunch of intermediaries (proxies, load balancers, application level firewalls etc.)

Here are also some esoteric limitations of HTTP spec, followed by some clients, which limit this approach also

But problems aside. I believe asynchronous servlets open new, broad horizons, but most likely average developer would not touch asynchronous API directly. In my opinion it is solid foundation for higher level frameworks, which will be subject of next instalment of this saga. Enjoy and stay tuned!

Further reading

If You are interested in this topic, I strongly recommend follow Greg Wilkins (of jetty fame) blog and cometd.org. This is valuable source of information, which this entry bases upon heavily.

del.icio.us Digg DZone Facebook Google Reader StumbleUpon Technorati