$URL: svn+ssh://christianc@svn.forge.objectweb.org/svnroot/barracudamvc/Barracuda2/trunk/WEB-INF/src_docs/architecture/comp/overview.html $ - $Revision: 125 $
Put all your content below this line

Event Model Performance

Note: These are some early developer notes on event model performance testing, preserved here for the interminably curious. Bottom line: the event model scales extremely well; converting URLs to event objects and dispatching to interested listeners simply is not an issue for performance (using XMLC/DOM will have a much greater impact on performance and scalability). If you want to know more, just read on...

This document provides a brief overview of performance within the Event Handler 7 example. This should all still apply to the Barracuda PR1 release (although I haven't actually gone back and re-run the tests).

Event Handler 7

In Event Handler 7, most of the performance testing notes from EventHandler6 (see below) still apply. I will take a few moments to talk about what changed.

The only real difference in EventHandler7 is that all HttpRequestEvents now implement Polymorphic, so when an events is dispatched, we end up creating a parent event (through Class.newInstance()) for every event in the parent chain. In other words, if you have an event that is 10 levels removed from HttpRequestEvent, there will be 10 additional events generated every time that event is dispatched. I was concerned that this might impair throughput, so I created a test (under org.barracudamvc.examples.ex3) that would allow me to test event dispatching to different levels of an event hierarchy, in order to see how much the depth impacted throughtput.

In a nutshell, I was unable to determine any real difference, but this may have been due to my test methodology -- I was testing across a local network against a low powered Linux box which was also acting as a web server to the outside world. Now, I don't believe that this site was experiencing any significant traffic, but I did observe significant variance in my results.

In general, when testing with 20 concurrent HTTP requests for the same event, I was able to handle between 150-250 requests per second, regardless of whether I fired an event that was 1 level removed from HttpRequestEvent or 10 levels removed.  I did notice a slight slowdown (perhaps more so than in EventHandler6 testing)  when I was NOT using event pooling; typically throughput dropped to 100-150 requests/second without event pooling. Note however, the amount of deviation in these figures is significant (so take them as a ballpark).

We really need more testing here to get a better feel for just what levels of throughput the framework itself can actually sustain.  Even if we could only support 50 requests per second, that figure would still amount to 180,000 requests per hour or about 4.3 million requests per day (which is still pretty dang good). In addition, we should also note the framework is still as fundamentally scalable as it was in the previous iteration. Hitting it with 500 simultaneous event dispatching requests worked flawlessly every time.

If someone would like to do some work in the area of stress testing, I'd be more than happy to work with you to set things up.

Event Handler 6

At this point, I have done some preliminary stress testing and verified that the basic framework appears to hold up under significant load.

We did have a syncronization problem in the DefaultEventPool that was causing deadlock when we ran out of available threads. Basically, because the internal lock() method was syncronizing on the lock object, nothing else could release events until the code timed out. The solution was to modify lock() to through a NoAvailableEventsException as soon as it sees there aren't any events available. The outer checkoutEvent() method can then sleep for a specified interval and then retry. After a specified number of retries, it will simply propogate the exception on up, and the ApplicationGateway will just manually create the event using reflection. This eliminates the deadlock, because the lock object is released in between locking attempts. This allows other threads to check events back in during this time.

Interstingly, the thread pooling stuff works quite effectively in terms of reducing overhead. When testing with 20 threads (each starting 20 millisecs after the previous one), the sample case was able to handle all 20 requests with only 3 actual event instances. This demonstrates that the event pooling mechanism can be used effectively to reduce the amount of resources needed to service large numbers of requests.

At the same time, the actual performance improvements that came from the Event Pooling were not as significant as I thought. In a very non-scientific experiment, I created a simple event handler and then (from another box) created 500 threads to issue concurrent HTTP requests for that event. With event pooling turned on, I was able to handle all 500 requests in an average of about 1000 millisecs. With event pooling turned off, it took about 1400 millisecs. In both of these cases, I was creating a new instance of the event handler for each request.

If we really wanted to get a true feel for how much the event pooling actually helps, we could create a standalone test to compare instantiation through EventPool vs. instantiation through reflection. The actual difference may in fact be moot, however.

One of the unknown factors here is really the amount of time spent setting up and tearing down the HTTP connection. Ultimately, this will be one of the larger bottlenecks, as will the time it takes to query or update a database and of course the amount of available network bandwidth. In short, even if there is a significant performance gap between pooling vs. reflection, the actual difference may be inconsequential in a real world app if there are significant costs associated with bandwidth, HTTP, and database communication.

At this point, we really shouldn't read too much into the performance testing I've done, other than to assert a few very obvious points:

  1. The event model framework appears fundamentally threadsafe - In short, it appears to stand up under significant load. I tested up to 500 simultaneous requests, and the framework was able to serve them all. I suspect that a majority of sites will not need anywhere near this capacity of throughput.  Regardless of what figure we decide we need to hit, the fact that we can support a large number of concurrent requests indicates that the fundamental architecture is sound.
     
  2. Initial throughput seems acceptable - From my perspective, the initial throughput results seem fairly acceptable. If we were only able to serve 50-100 requests per second I'd say we were off to a rough start. Of course, we really need to test these in the context of a presentation framework in order to arrive at a true level. One of the things we might consider as an early objective is to port the Pet Store app to various presentation frameworks in order to accurately measure throughput with various web-app approaches.

    (Do we have a line in the sand saying "Any web-app framework must be able to server at least X number of requests per second?"  For instance, even the ability to handle a paltry 250 requests per second would translate into into 15,000 requests per minute, or 900,000 requests per hour. That's a pretty hefty figure.)
     
  3. There is a slight performance benefit to EventPooling - I think we should hold off judgment on this until we have a real world app to benchmark. In short, while it works and that's great, I'm really beginning to suspect that the cost of instantiating one event through reflection as opposed to caching it in an event pool is probably inconsequential in the larger scheme of things. Now, we certainly don't want to be creating ALL events through reflection, but early indications seem to suggest that there may not be a whole lot of benefit to the pooling mechanism. We need to reserve judgment on this.
     
  4. The framework is very tuneable - Given the inability to know for certain where the performance bottlenecks will lie, one of the good things about the current implementation of the event model is that it doesn't force you to implement things one way or another. For instance, you can use event pooling or you can elect not to. You can implement listener factories to provide a new instance of listener or you can reuse a common synchronized instance. In short, the architecture itself does not make any decisions one way or the other...it defers those decisions to implementation, and where possible makes them configurable. This buys us a lot in terms of future tuneability.

 


Put all your content above this line

$Date: 2006-01-02 15:59:13 -0500 (Mon, 02 Jan 2006) $