Archive for category Architecture

The Truth About Code Generation

Code generation done right can be a very effective and highly useful tool in your toolbox.  Done wrong it could be a maintenance nightmare.  This article reflects on different types of code generation, when to use each of them, and explains some pitfalls to avoid.

WHAT CODE GENERATION ISN’T:  A SILVER BULLET

Before we explore what code generation is and how to use it effectively, we must first understand what it isn’t:  A silver bullet.

No amount of code generation will save a doomed project.  If you’ve got inadequate staff , bad requirements (or no requirements), poor project sponsorship, or any number of the classic mistakes, code generation will not help you.  You’ve got bigger problems.

Moreover, you shouldn’t expect miracle productivity gains by using a code generator.  Fred Brooks and Steve McConnell (in The Mythical Man Month and Rapid Development, respectively) argue persuasively that actual coding and construction of software is or should be a minority part of the schedule.  Even if coding accounts for 50% of the schedule (which is doesn’t) and you can effectively generate half of the project’s code (which you can’t), the best you can hope to achieve is a 25% reduction in effort.

In reality, boilerplate code (the kind that is best generated) has been on a long, gradual decline thanks to advances in technology and better abstractions.  We’re left more and more to focus on the differences in our software (the essence) and less with the mundane minutiae of simple coding tasks (the accidental).

This is what Fred Brooks argues in No Silver Bullet.  There is no single tool that can produce an order of magnitude gain in productivity or quality because the accidental complexity of software (the act of constructing software itself) gets continuously easier, leaving you to focus on the truly hard problem (the essence):  What does your software do, how can it do it, and how do we test it sufficiently to know that it does it?

No silver bullet, indeed.

WHAT CODE GENERATION IS

A code generator is a tool that takes metadata as its input, merges the metadata with a template engine, and produces a series of source code files for its output.  The tool can be simple or elaborate, and you can generate any kind of code that you want.  You simply need to write the control program and templates for whatever you want to generate.

Code generation done well can save you some time in the long run (you have to invest effort in creating your generator) and increase quality because you know all generated code will be identical.  Any bugs you find in the code will be corrected once in the template.

One argument against code generation is that a data-driven subroutine can produce the same result as code generation.  I agree with this argument because the generator is a data-driven program.  Runtime reflection and good abstractions can produce the same results as code generation. I would argue, though, that this code is more complicated than the code created by the generator.  The generator might be as complex as the data-driven subroutine, but the code that is produced by the generator should be simple by design.  It would be trivially easy to attach a debugger and step over the generated code to find a bug.  I like debuggability.

Active vs. Passive

Generators come in two flavors:  Active and Passive.  Both are useful, but you must plan and design your project accordingly.

An active code generator maintains the code for the life of the project. Many active generators are invoked during the build process.  XDoclet is a good example of an active code generator.  I’ve used XDoclet to generate my webapp’s struts-config.xml file, and the generator was invoked by Ant during the build.  Another popular use of XDoclet is generating the boilerplate code and configurations for Enterprise Java Beans (EJBs).

Code generated by an active generator may or may not be checked into source control.  When invoked during a build and as part of the final artifact, generated code probably would not be in source control.  On the other hand, the output from an active code generator can be checked into source control and you could remove that step from the build process.  This isn’t to say the code is then maintained by hand!  On the contrary, the generator can be invoked frequently during a project.  The purpose of the active generator is to maintain the generated code.

A passive code generator creates code that you expect to maintain by hand afterwards.  Consider a wizard that asks you some questions before creating your basic class for you.  Likewise, many IDEs have useful generation snippet such as generating all your getters/setters from your class’ instance variables.  Both of these examples are simple yet extremely useful.  I would be continually frustrated if I had to write all my getters/setters by hand.

Passive code generators needn’t stop at simple IDE-level functionality.  Maven archetypes, for example, can create an entire project setup for you.  They create all your directories and starting pom.xml.  Depending on the archetype, this could be quite complex.

Similarly, you can create entire skeletal projects with functionality from a passive code generator.  One good example would be AppFuse, which creates your project structure, layout, build scripts, and can optionally create some basic functionality like user authentication.

IT’S JUST A TOOL

Always remember that code generation is a tool in your toolbox, nothing more.  More accurately, it’s a tool and die.

Every manufacturer has highly skilled workers creating dies, molds, and machine tools to create they parts they need.  Expert furniture makers don’t hand carve each and every table leg they require.  They make a jig and create exact copies of the table leg.  Each leg may be lovingly hand-checked for quality and assembled in the final table, but each leg certainly isn’t carved individually.

In the software world, there will be times when you need expert programmers writing templates and fewer junior engineers cranking out grunt code.  The experts make the tools and dies of our software world.

YOUR RESPONSIBILITY

If code generation is just a tool, then responsibility falls to the developer to understand when and how to use it.  It becomes the developer’s responsibility to create a design that does not require hand modification of any actively generated code. The design should be robust enough with plenty of hooks to allow for modification when needed.

One possible solution is to use active generation for base classes while using subclasses throughout the code.  The subclass could contain all the application-specific code needed, override base functionality as required, and leave the developer with a domain that could be easily regenerated while preserving all hand-written code.  Another design consideration is to model your application into a framework somewhat like Spring. Spring makes extensive use of the Template Method pattern and provides plenty of documented hooks for you to override when needed.

CONCLUSION

Code generation done well can increase quality and decrease costs in a project.  Time savings are compounded, too, when you find yourself implementing similar code across projects.  Each successive new project can benefit from the templates made in the last project.

Consistency across all generated code yields an easier learning curve because developers learn one standard way for basic functionality, leaving them to focus on the custom pieces of an application. Put another way, place as much functionality into the “accidental” realm as you can so that your developers can focus on the “essence.”  Generated code is easily understood and allows for better debuggability than runtime abstractions that produce the same effect.

There are very specific design considerations to be mindful of, particularly the need for a design to be robust enough to ensure hand-modification of actively generated code is not required.

Combine good active code generation with a library of common components and you will find yourself covering a large percentage of an application’s accidental complexity, leaving you more time to focus on the essence.

Code generation is a good tool for your toolbox.  An expert developer will understand when and how to use it effectively.

Failure rates are cumulative

There used to be only two guarantees in life:  death and taxes.  In today’s complex IT environment, we can add a third:  Your production software systems will fail.  This is an absolute guarantee because the math is stacked against you.

Why?  Because failure rates are cumulative.  Simply put, five integrated systems each with 99.9% uptime yields an overall 99.5% uptime, unless you somehow figure out how to get them to fail at the same time.  Since you can’t do that, you will have a 5 failures for each 1000 runs of your program/service/job/etc.

That’s the math.  Better plan for failure.

HOW TO: Use mini-batching to improve grid performance

We achieved a 3.5X increase in throughput by implementing “mini-batching” in our grid-enabled jobs.

We have a parent BatchService that creates child Services where each individual Service is a unit of work.  A Service implementation might perform some calculation for a single employee of a large employer group.  When the individual Services are very fast and the cost of bussing them around the network is greater than the cost of processing the Service, then adding more consumers makes the BatchService run slower!  It is slower because these fine grained units of work require more queue locks, more network traffic, and more handling calls when the child Service is returned back to the parent BatchService for accumulation.

The secret, then, is to give each consumer enough work to make the overhead of bussing negligible.  That is, give each consumer a “mini-batch” of Services to run instead of sending just one Service to a consumer.

Here’s a graph of some of our benchmarks:

throughputbybatchsize.png

Some of the data surprised us.  For example, we expected 3 big batches to run fairly slowly across 11 consumers because there would be 8 consumers sitting idle, but we were not expecting 11 batches to run more slowly than 43 batches.  We thought dividing the work equally across consumers in the exact number of batches would be the lowest point on the graph.  We were wrong.  We expected the U-shape, but we thought the trough would be at a different batch size.

Our test system can only support up to 11 consumers, so we haven’t yet tested batch sizes with more than 11, but the graph implies that we’ll have a deeper trough when we add consumers and tweak the batch size.  There should be, in theory, a point where we can’t process jobs any faster due without killing the database.  I’ve warned our DBAs that we’re looking to hit that point.

If you’re doing any kind of grid computing (by way of Terracotta’s Master-Worker project, GridGain, or rolling your own), check out the effects mini-batching can have on your throughput.  You might be surprised by your benchmarking metrics!

More proof that you can’t keep a good idea down?

In this blog article, Michael Nygard discusses a talk he attended where a technical architect discussed an SOA framework at FIDUCIA IT AG, a company in the financial services industry. Nygard describes an architecture that echoes many of the features I implicitly spoke of in my first blog article about my big integration project / message bus.

You may be asking yourself right now, why does he keep talking about this particular project? Briefly: it’s been a very fun project, it’s ongoing, it consumes most of my daily brain cycles, we’re still growing it (it’s a brand new infrastructure for us), and it encompasses a whole lot of ideas that I thought were good and that are now being validated by other projects I read about online.

So, what other unsung features did we build in that I’ll now sing about?

Asynchronous Messaging

You’ll notice the Spooler component in the original broad sketch of our architecture. The high-level description I gave the Spooler touched on callbacks. Asynchronous messaging was left unsaid, but it is implied by having a mechanism for callbacks.

The description also labeled my Spooler an endpoint, which may be a web service endpoint. We use web services only because the Enterprise Service Bus (ESB) orchestrating work on our bus is .NET-based while our project is all Java. That said, we post Plain Ol’ XML (POX) over HTTP, which is deserialized quickly to a Java POJO. Our entire messaging system works on POJOs, not XML.

The outside world may use SOAP (or XML-RPC or flat files or whatever) when communicating with my company, but internally our ESB talks POX with the bus. Mediation and transformation (from SOAP –> POX) is part of the functionality of an ESB. Consumers, internally to our bus, would directly access queues instead of using web services.

Pure POJOs, but distributed

It’s extremely productive and useful to work with a pure POJO model, and it’s even more productive and useful when the state of those POJOs is automagically kept in sync across the cluster regardless of what node is working on it. This is where Terracotta Server shines.

We pass POJOs around through all the queues. Consumers — which can exist anywhere on the network — process the Service/Job/Message (all interchangeable terms, as far as I am concerned — they are all units of work). Our messages are stateful, meaning they enter our bus empty except for parameters in instance variables, get routed around to various and sundry consumers across the network, and get posted back (the callback) full of data to the ESB.

Why do we need distributed POJOs? Well, we found it to be highly useful. For example, we offer a REST API to abort a pending message (such as http://ourendpoint/message/abort/abcdefg-the-guid-wxyz). The easiest way we found to tell the entire bus to disregard this message was to flip the bit on the message itself. The endpoint is running under Terracotta Server, all of the queues live in TC, and our consumers are likewise plugged in. If you stick all your messages in a Map (or series of maps if you’re worried about hashing, locking, and high volumes) where the GUID is the key and the value is the message, then the endpoint or any consumer can quickly obtain the reference to the message itself and alter its state. We can also write programs that hook into TC temporarily to inspect or modify the state of the system. Persistent memory is cool like that. It exists outside the runtime duration of the ephemeral program.

The endpoint likewise has REST APIs for returning the state of the bus, queues sizes, current activity, and other metrics. All of this data is collected from the POJOs themselves, because the endpoint has access to the very object instances that are running all over the network. It just so happens this architecture works wonderfully inside a single JVM, too, without TC, for easier development and debugging.

Load balancing and routers

Straight from Michael Nygard’s article:

Third, they’ve build a multi-layered middle tier. Incoming requests first hit a pair of “Central Process Servers” which inspect the request. Requests are dispatched to individual “portals” based on their customer ID.

In other words, they have endpoints behind load balancers (we use Pound) and “dispatched” is another word for “routed.” We have content based routers (a common and useful Enterprise integration Pattern for messaging systems) that route messages/services/jobs of specific types to certain queues. Our consumers are not homogenous. We’ve configured different applications (the integration aspects of our project) to listen on different queues. This saved us from having to port applications off the servers where they were previously deployed. These apps are several years old. Porting would have taken time and money. Allowing messages to flow to them where they already exist was a big win for us.

More to come

I’ve got the outline for my white paper complete, where I bulleted the features above as well as those in my previous blog article. There are other features I haven’t covered yet. Overall, I think it will be an interesting paper to read.

Still, I’m a little jealous, though, that FIDUCIA IT AG has scaled out to 1,000 nodes in their system. I can’t say how many nodes we’re up to, but I can say I’m looking forward to the massive scalability that our new architecture will give us.

You can’t keep a good idea down

Our message bus project was more than just replacing JMS with a POJO messaging system. It’s a whole piece of infrastructure designed to make it easy for different folks to do their jobs.

How did we do this and why do the next couple of paragraphs sound like I’m bragging? Because many of the features we implemented were recently announced in a new open source project (more on that later). Bear with me as I go through some of the features we implemented, knowing that I’ll tie them to the features of a recently announced (and exciting) open source middleware product.

Configuration Management and Network Deployments …

We deploy applications to our bus over the network by the way of a simple little bootstrap loader. You’ll note the Java class I used in my blog article uses a URLClassLoader. My example used a file URL (”file://”) but there’s nothing stopping those URLs from beginning with “http://…”

This lets our Config Mgt. team deploy applications to a single place on the network. As nodes on the network come up, they’ll download the code they need to run.

via Bootstrapping

While we’re on the subject of bootstrapping, there’s nothing stopping a smart developer from bootstrapping different applications into different classloaders. Again using the Java class from my blog article, you’ll notice the code finds the “main” class relative to the classloader. Who says you need just one classloader? Who says you can only run “main” methods? Stick all your classloaders in a Map and use the appropriate classloader to resolve an interface like, say, Service (as in Service Oriented Architecture) with an “execute” method. Suddenly, you can have applications that are invoked by a Service. We used this very technique to integrate legacy, stand-alone applications into an SOA.

Take bootstrapping and isolated classloading one step further and you’ll soon realize you can load multiple versions of the same application side-by-side in the same container. One could be your release branch version, the other could be your trunk code. Same infrastructure and container. We did that, too.

Lastly, what happens if you dump a classloader containing an application and replace it with a new one containing a new version of the application? Well, you just hot swapped your app. You updated the application without restarting your container.

Focus on Developer Productivity

Developers? We got them covered, too. We went for a simple pure Java POJO model. No app servers, databases, or anything else required. A developer can run the entire message bus — the entire infrastructure — in a single JVM, which means inside your IDE. Unit tests are a snap because all Services are POJO classes. Did I mention it takes one whole second for someone to start a local instance of our message bus in a single JVM? I’m a developer first, architect second. I like to think about how my decisions affect other developers. If I make it easy, they’ll love me. If I don’t make it easy, well, … I don’t think I’ve done my job well.

Utility Computing w/o Virtualization

It’s a lot easier to get efficient use of hardware once you can load all your applications into a single container/framework. If you bake in queuing and routing (a message bus), then you can implement the Competing Consumers pattern for parallelism in processing. Also, if all message consumers are running all applications (thanks, classloading!), then your consumers can listen on all queues to process all available work. This is utility computing without a VM. Our project lets us use all available CPU cycles as long as there is work in the queues.

There’s also the Master/Worker pattern to process batch jobs across a grid of consumers. Grid computing is one aspect of our project, but a minor one. I’m more interested in the gains we achieve through utility computing and the integration of several legacy applications to form our SOA.

The Open Source Version

Here are some features from the open source project, tell me if they sound familiar:

  • You can install, uninstall, start, and stop different modules of your application dynamically without restarting the container.
  • Your application can have more than one version of a particular module running at the same time.
  • OSGi provides very good infrastructure for developing service-oriented applications

http://www.javaworld.com/javaworld/jw-03-2008/jw-03-osgi1.html

SpringSource recently announced a new “application platform” with some of the following features and benefits:

  • Real time application and server updates
  • Better resource utilization
  • Side by side resource versioning
  • Faster iterative development
  • Small server footprint
  • More manageable applications

http://www.springsource.com/web/guest/products/suite/applicationplatform

http://www.theserverside.com/news/thread.tss?thread_id=49243

Now, if only the SpringSource Application Platform could add queuing and routing to their project, we might consider porting to it. In the meantime, I’m happy to see other projects validating the ideas we pitched here at our company.

I’m excited, too, to announce that I’ve received the blessing of Management and PR to write a white paper about our project. It will cover all the above features as well as a slew of others, such as service orchestration and monitoring, asynchronous callbacks, a few other key Enterprise Integration Patterns, and it will explain how we used Terracotta Server to tie it all together. Stay tuned! I’m going to write blog articles to coincide with the sections of the paper.

Scalability & High Availability with Terracotta Server

Our message bus will be deployed to production this month. We’re currently sailing through QA. Whatever bugs we’ve found have been in the business logic of the messages themselves (and assorted processing classes). Our infrastructure — the message bus backed by Terracotta — is strong.

SCALABILITY

People are asking questions about scalability. Quite frankly, I’m not worried about it.

Scalability is a function of architecture. If you get it right, you can scale easily with new hardware. We got it right. I can say that with confidence because we’ve load tested the hell out of it. We put 1.3 million real world messages through our bus in a weekend. That may or may not be high throughput for you and your business, but I guarantee you it is for our’s.

The messages we put through our bus take a fair amount of processing power. That means they take more time to produce their result than they do to route through our bus. How does that affect our server load? Terracotta sat idle most of the time. The box hosting TC is the beefiest one in our cluster. Two dual-core hyperthreaded procs, which look like 8 CPUs in htop. We figured we would need the most powerful server to host the brains of the bus. Turns out we were wrong, so we put some message consumers on the TC box, widening our cluster for greater throughput. Now the box is hard at work, but only because we put four message consumers on it.

When we slam our bus with simple messages (e.g, messages that add 1+1), we see TC hard at work. The CPUs light up and the bus is running as fast as it can. 1+1 doesn’t carry much overhead. It’s the perfect test to stress the interlocking components of our bus. You can’t get any faster than 1+1 messages. But when we switched to real world messages, our consumers took all the time, their CPUs hit the ceiling, and our bus was largely idle. The whole bus, not just TC. We’ve got consumers that perform logging and callbacks and other sundry functions. All of these are mostly idle when our message consumers process real world workloads.

We’ve got our test farm on 4 physical nodes, each running between 4 and 8 Java processes (our various consumers) for a total of 24 separate JVMs. All of these JVMs are consumers of queues, half of them are consumers of our main request queue that performs all the real work. The other half are web service endpoints, batch processors, loggers, callback consumers, etc. and each are redundant on different phsyical nodes. Because our message processing carries greater overhead than bussing, I know we can add dozens more consumers for greater throughput without unduly taxing Terracotta. If we hit a ceiling, we can very easily create another cluster and load balance between them. That’s how Google scales. They’ve got thousands of clusters in a data center. This is perfectly acceptable for our requirements. It may or may not be suitable for your’s.

You might be thinking that dozens of nodes isn’t a massive cluster, but our database would beg to differ. Once we launch our messaging system and start processing with it, we’ll begin to adversely impact our database. Scaling out that tier (more cheaply than buying new RAC nodes) is coming next. I hope we can scale our database as cheaply and easily as our message bus. That’ll enable us to grow our bus to hundreds of processors.

Like I said, I’m not worried about scaling our bus.

HIGH AVAILABILITY

I might not be worried about scalability, but I am worried about high availability. My company is currently migrating to two new data centers. One will be used for our production servers while the other is slated for User Acceptance Test and Disaster Recovery. That’s right, an entire data center for failover. High availability is very important for our business and any business bound by Service Level Agreements.

Terracotta Server has an Active-Passive over Network solution for high availability. There is also a shared disk solution, but the network option fits our needs well. Our two data centers are connected by a big fat pipe, and Terracotta Server can have N number of passive servers. That means we can have a redundant server in our production data center and another one across the wire in our DR data center. We’ve also got a SAN that replicates disks between data centers. We might go with the shared disk solution if we find it performs better.

Overall, though, it is more important for our business to get back online quickly than it is to perform at the nth degree of efficiency. Messaging, after all, doesn’t guarantee when your stuff gets run, just that it eventually runs. And if everything is asynchronous, then performance, too, is a secondary consideration to high availability.

CONCLUSION

If there’s one lesson to be learned through this blog article, it’s that one size does not fit all. Not all requirements are created equal. Our message bus is the right solution for our needs. Your mileage may vary. Some factors may outweigh others. For example, having a tight and tiny message bus that any developer can run in their IDE without a server (even without TC) is a great feature. No APIs lets us do that with Terracotta. You might have very different requirements than we do and find yourself with a very different solution.

InfoQ writes about my use of Terracotta Server as a message bus

Check out this article on InfoQ about using Terracotta Server as a message bus!

Some wheels need reinventing

Reinventing a square wheel is a common anti-pattern. The idea is a) we don’t need to reinvent the wheel because b) we’re likely to recreate it poorly compared to what is already available. But if we never reinvent any wheels, then we never progress beyond what we have. The real question, then, is when does it make sense to recreate a wheel? Some wheels need to be recreated.

I recently reinvented a wheel. A big one. The wheel is “Enterprise Messaging,” which much be complex because it has “enterprise” right in the name! I’d be a fool to reinvent that wheel, right? Maybe. Maybe not. We fit our “enterprise messaging system” into 92kb:

enterprise_messaging_in_92kb.jpg

Some won’t consider 92kb to be “enterprisey” enough, but that’s ok with me. I know we were able to put 1.3 million real-world messages through our bus over a weekend. That’s enterprisey.

Jonas Bonér wrote an article about building a POJO datagrid using Terracotta Server, and I replied on his blog saying we did something similar by using Terracotta Server as a message bus. Another reader asked why I did this instead of using JMS.

I think there are several benefits to this reinvented wheel:

TINY!

92kb contains the entire server framework. We have another jar containing common interfaces we share with client applications that weighs in at 18kb.

It works!

A single “consumer” in our framework is bootstrapped into an isolated classloader, which enables our framework to load applications (the various apps we need to integrate) into their own isolated classloaders. One consumer can process a message for any integrated application.

This is utility computing without expensive VMWare license fees.

We’re consolidating servers instead of giving each application dedicated hardware. The servers were mostly idle, anyway, which is why enterprises are looking to utility computing and virtualization to make more efficient use those spare CPU cycles. In our framework, hardware becomes generic processing power without the need for virtualizing servers. Scaling out the server farm benefits all applications equally, whereas the prior deployments required separate capital expenditures for each new server.

Pure POJO

Our framework runs inside an IDE without any server infrastructure at all. No ActiveMQ, no MySQL, and no Terracotta Server. Developers can stand up their own message bus in their IDE, post messages to it, and debug their message code right in the framework itself.

We introduce Terracotta Server as a configuration detail in a test environment. Configuration Management handles this part, leaving developers to focus on the business logic.

So, I might not be writing my own servlet container anytime soon (not when Tomcat and Jetty are open source and high quality), but I think it made a lot of sense to reinvent the “enterprise messaging” wheel. Terracotta Server allows me, in effect, to treat my network as one big JVM. My simple POJO model went wide as a configuration detail. That makes my bus (and TC) remarkably transparent.

Terracotta Server as a Message Bus

Terracotta is excellent software to glue messaging components together. This article is a high-level view of how we used TC to create our own messaging backbone.

Just a few weeks ago I made two predictions for 2008, but both centered around Terracotta. Since that time, I’ve gone deeper into the server and used it to write a message bus for a non-trivial integration project.

I’m impressed.

Our first implementation used a MySQL database for a single queue. JTA transactions running “select for update” statements against InnoDB worked just fine, actually, but there were other clunky things about that implementation. All roads looked like they led to queuing and routing. In a nutshell: enterprise messaging with multiple queues, not just batch jobs on a single queue.

Our second implementation (I believe strongly in prototyping, a la Fred Brooks “Plan to throw one away”) used JMS. Early in our design process, we talked about implementing our own messaging system using TC. We managed to talk ourselves out of it because a) no one else that we know of has done it and b) ActiveMQ is also open source, mature, and Camel looked very cool insofar as they give you a small domain specific language for routing rules between queues. The Camel project claims to have implemented all the patterns in EIP.

Well, we managed to deadlock ActiveMQ with multiple clients running with Spring’s JmsTemplate. Our request queue would just stop. We’d get an error saying our message couldn’t be found and the queue would simply stall. We couldn’t restart it without bouncing ActiveMQ. New clients all blocked on the queue. ActiveMQ did not survive our load test well. When we inquired, we were told about an know problem between Spring and ActiveMQ and that we should use the latest snapshot.

DISCLAIMER: I understand the preceding paragraph is entirely FUD unless I provide tangible evidence otherwise. We’ve since moved on from that implementation and removed all the JmsTemplates from our Spring apps. I won’t be providing screenshots or sample code to deadlock their server. To be fair, we did not choose to try again with another FOSS JMS queue, like JBoss. Our configuration of ActiveMQ and our Spring JmsTemplate clients may have been wrong. Feel free to take my criticism above with the proverbial grain of salt.

Happily, my team understands good design and the value of clean interfaces. All JMS-related code was hidden by handler/listener interfaces. Our consumer logic did not know where the messages (our own domain objects) came from. Implementations of the handlers and listeners were injected by Spring. As a result, it took just 90 minutes to swap in a crude but effective queueing and routing system using Terracotta. We’ve since cleaned it up, made it robust, added functionality for business visibility, and load tested the hell out of it. It all works beautifully.

Here are the main ingredients you need to roll your own message bus with Terracotta:

  1. Knowledge of Java 5’s Concurrent API for queueing
  2. Java’s Concurrent API expertly handles nearly all of your threading issues. Bounded LinkedBlockingQueues (also ArrayBlockingQueues) will neatly throttle your entire system for you. Consumers live in their own threads (and through the magic of Terracotta they can live in their own JVMs!) and can safely remove the next item from the queue, optionally waiting for a period of time for something to become available. Producers can add messages to a BlockingQueue in a thread-safe way, also optionally waiting for space to become available.

  3. Knowledge of Java threading for consumers and producers
  4. You’ll need to be able to start and stop your own threads in order to create producers and consumers.

  5. Daemon Runners
  6. Daemon Runners (my term for them, a better one may already exist) are long running POJO Java processes that you can cleanly shutdown later. Browsing Tomcat’s source code taught me a neat trick for hooking into a running JVM. Write a main program which spawns a thread that runs your actual application. Have the main thread open a ServerSocket and await a connection. When a token such as “stop” comes through, main stops its child thread and your application can exit gracefully. Anything else over the socket can be ignored, which lets your ServerSocket go right back to listening. We implemented a “gc” command, among others, to provide simple but effective hooks into our running processes anywhere on the network. You just need the IP and Port. You can optionally put IP checks into your daemon runner to validate that the IP sending the token is a trusted one. Our runners only accept tokens from 127.0.0.1. SSH lets us run scripts from across the network.

  7. Named classloaders
  8. Named classloaders is a TC trick needed to run multiple stand-alone Spring applications yet have them share the same clustered data. TC ties applications together using specific names for classloaders. Modules they’ve built already know how to cluster Tomcat Spring applications, for example, because the classloaders are the same every time. In standalone apps, you’re not guaranteed that the system classloader even has a name, let alone the same name across JVMs. See this post on TC’s forums to make a named classloader. It wasn’t hard. There may be another way to cluster standalone Spring apps. The named classloader did the trick for us. You will need to bootstrap your application to make this work. You should probably be doing this anyway.

  9. Spooler
  10. A Spooler lets your messaging system accept messages long after the rest of the queues get throttled by a bounded BlockingQueue. Your Spooler is an endpoint (maybe a web service endpoint) that will put everything it receives into an unbounded queue: your spool. A Spool consumer will read from the spool and forward to the next queue. Because the next queue is bounded, you’ve achieved throttling. You may have other components in your messaging system that require spooling. For example, we’ve got a consumer that performs callbacks and posts the results of the message to the callback URL. What happens if the callback endpoint is down? We don’t want our throttled message system to stop processing messages, so we spooled messages going into the callback queue.

  11. Consumer interface
  12. You’ll need to create a class or two around queue consumption. Our first crude implementation simply injected the queue itself into the listening thread. The listening thread blocks/waits on the blocking queue (hence the name!) until something is available. We’ve refined that a bit so that we now have listener classes that monitor the queues and pass the messages to consumer classes. The business logic is pure POJO Java logic, which is easily unit testable. This is, in essence, an event-driven system where your POJO class accepts events (messages) but doesn’t know or care where it came from. You want to decouple the business logic from the plumbing.

  13. Terracotta Server — messaging backbone & glue
  14. Last but not least, you need some queues, you need multi-JVM consumers, you need persistent data (a message store) that won’t get wiped out with a catastrophic failure, you need business visibility to track health and status of all queues and consumers, and you need to glue them all together. Terracotta Server handles these requirements very well.

TC really came through for us. We were curious about some of its behavior in a clustered environment. We made some assumptions about its behavior based on what would be ideal for minimizing network chatter and limiting heap size. TC nailed every single one of our assumptions.

We made the following assumptions and were happy to find out that all held up under load testing:

  1. L1 clients that were write-only wouldn’t ever need to have the entire clustered/shared dataset faulted to its heap. If you’re not going to read it, you don’t need it locally.
  2. Clustered ConcurrentMaps have their keys faulted to all L1 clients, but values are retrieved lazily.
  3. Reading from a BlockingQueue would fault just one object to the L1 client, instead of faulting in the entire queue, because the single object is retrieved in a TC transaction.
  4. TC and our unbounded spools wouldn’t run out of memory because TC pages object graphs to disk. Our unbounded L1 clients would work within an acceptable memory band.
  5. We can add/remove consumers to any point in our messaging system without affecting the entire system.

We’ve got our canaries in the coal mine, so we see what the entire system is doing in real time. We’re happy to see that our memory bands are predictable and that we’re entirely CPU bound. This is excellent for horizontal scalability. We can simply throw more processors at any part of our system to scale out. It doesn’t look like Terracotta server will be a bottleneck because the messages we’re processing take significantly more time to crunch than it takes to route through our queues. We have enough juice on our TC box to handle dozens more consumers across the network, which would give us significant throughput gains. We can revisit this when we have the need for hundreds of consumers. I’ll assume TC server will scale up with us, but if it can’t for any reason, it is perfectly acceptable to have more than one messaging cluster. That’s how Google scales. There are lots and lots of clusters in Google’s datacenters. Bridging between two messaging systems is a solved problem. That’s what messaging is, after all, a connection between disparate systems.

What did we gain?

Initially, we had MySQL. Then we added ActiveMQ, which is backed by MySQL. We saw how TC server would be beneficial if only to cluster POJOs that gather runtime data, so we had TC server in the mix. That’s three different servers in our system all of which needed high availability and routine backups. All were configured in Spring, making our dependency injection a maze to follow through.

When we switched to a TC message bus, we got rid of 2/3 of the infrastructure and most of the Spring configurations. We now have just one piece of infrastructure to maintain in a highly available way.

But I’m a guy that really likes simple. TC lets us make an entirely POJO system that runs beautifully in IntelliJ. A single “container” type main program can run all our components in a single JVM simply by loading all our various Spring configs. Developers can run the entire messaging system on their desktop, in their IDE, and run their code against it. They can post messages to an endpoint listening on 127.0.0.1 and debug their message code inside the messaging system itself.

We replace our container main with Terracotta in our integration and test environments. TC seamlessly and invisibly wires together all the components of the system, irrespective of where they live on the network. The POJO model goes wide with Terracotta server. It’s elegant, simple, and Just Worksâ„¢.

Canaries in the coal mine

Application logging is only as useful as your plan to actually use the logs. Without a plan to mine the data, collect metrics, and plot graphs, your logs are useless. It’s snowcrash in a console window. It’s gigs of spam in a file.

This reminds me of the Philosophy and Zen of Unix:

Rule of Silence: When a program has nothing surprising to say, it should say nothing.

But how do you know your program is running? You’ve got several options available to you, all good, and you should probably implement them all.

Canaries in the coal mine

In the good ol’ days, miners had a crude but effective way to test a mine shaft for adequate levels of oxygen: if the bird died, miners got out of the shaft.

Your program needs a canary in the coal mine. You need a way to smoke test your application when it first boots up and while it’s running. It either works or it doesn’t. The bird is dead, dying, or singing.

What kind of canary? One that tests some discrete bit of functionality of your application. You can use a simple site monitoring program with basic tests baked into a server page. You can run a load testing tool like JMeter to script mimicking what an end user would do. Just run a test of 1. In the messaging application I’m currently building, we send periodic test messages to the queues. The messages aren’t fancy, just tiny XML messages posted from a Python client requesting 2+2.

But 2+2 is important. It’s like your first Hello, World! program in a new language. Getting 2+2 running means you’ve successfully setup your environment, you understand the basics of compilation, packaging, deployment, and configuration management. You’ve also got your first benchmark of how quickly a simple message can pass through your system.

You want to log the data from the canaries in the coal mine. Ping your canaries every ten minutes. Keep those results and metrics. Create a plan to report on them, which puts you on the path of Statistical Process Control.

Statistical Process Control

You can spend a decade trying to attain CMM Level 5 accreditation or gaining your black belt in Six Sigma and probably still never completely grok the enormity of Statistical Process Control. You can, however, start improving your technical operations by using meaningful statistics to smooth out your Configuration Management practices.

So what is Statistical Process Control? Quoting Wikipedia:

Statistical Process Control (SPC) is an effective method of monitoring a process through the use of control charts. Much of its power lies in the ability to monitor both process centre and its variation about that centre. By collecting data from samples at various points within the process, variations in the process that may affect the quality of the end product or service can be detected and corrected, thus reducing waste and as well as the likelihood that problems will be passed on to the customer. With its emphasis on early detection and prevention of problems, SPC has a distinct advantage over quality methods, such as inspection, that apply resources to detecting and correcting problems in the end product or service.

In addition to reducing waste, SPC can lead to a reduction in the time required to produce the product or service from end to end. This is partially due to a diminished likelihood that the final product will have to be reworked, but it may also result from using SPC data to identify bottlenecks, wait times, and other sources of delays within the process. Process cycle time reductions coupled with improvements in yield have made SPC a valuable tool from both a cost reduction and a customer satisfaction standpoint.

In layman’s terms, you aggregate the data from your canaries into a graph. You watch the graph every day to eventually find your “center”, the normal singing voice of your canary. Your data tells you he sings at X decibels when healthy, and your graphs show you when there’s not enough oxygen in the mineshaft.

This entire process should be automated! If it’s not, you won’t do it. Your Ops center and CM folks should have at least one box set aside for automation and monitoring. Maybe it’s your build box. Put all your scripts there. Create cron jobs or Windows scheduled tasks to constantly parse your log files for data. Use Log4J’s JMS or JDBC Appender if you don’t want to parse text files. Get all your data in one place, mine it, and graph it.

Test Driven Deployment

Everyone knows about Test Driven Development, where you write your test code before your write your business logic. It forces you to actually design your code by making you interact with the class/object early in the process. Many preach TDD, some actually practice it.

I’m not personally aware of many architects or organizations that practice what I call Test Driven Deployment. This is the habit of understanding what your canaries are before you write your application. You will change how you architect, design, and deploy your software if you understand up front what data you want to capture and how you’ll access it. It forces you to design your solution before you try to implement it, just like Test Driven Development.

Pipe [debug] and [info] level logging to /dev/null

Divide your logging output into discrete files with meaningful names. Canaries and metrics can go to one set of files. Application errors and contextual information to help diagnose bugs should go to another file. Debugging output goes right to the black hole.

Make your log files useful. Practice Test Driven Deployment. Bring a canary with you down into the coal mine, and listen when he stops singing.

Switch to our mobile site