21st Feb 2008 by Mark Turansky

Some wheels need reinventing

Reinventing a square wheel is a common anti-pattern. The idea is a) we don’t need to reinvent the wheel because b) we’re likely to recreate it poorly compared to what is already available. But if we never reinvent any wheels, then we never progress beyond what we have. The real question, then, is when does it make sense to recreate a wheel? Some wheels need to be recreated.

I recently reinvented a wheel. A big one. The wheel is “Enterprise Messaging,” which much be complex because it has “enterprise” right in the name! I’d be a fool to reinvent that wheel, right? Maybe. Maybe not. We fit our “enterprise messaging system” into 92kb:

enterprise_messaging_in_92kb.jpg

Some won’t consider 92kb to be “enterprisey” enough, but that’s ok with me. I know we were able to put 1.3 million real-world messages through our bus over a weekend. That’s enterprisey.

Jonas Bonér wrote an article about building a POJO datagrid using Terracotta Server, and I replied on his blog saying we did something similar by using Terracotta Server as a message bus. Another reader asked why I did this instead of using JMS.

I think there are several benefits to this reinvented wheel:

TINY!

92kb contains the entire server framework. We have another jar containing common interfaces we share with client applications that weighs in at 18kb.

It works!

A single “consumer” in our framework is bootstrapped into an isolated classloader, which enables our framework to load applications (the various apps we need to integrate) into their own isolated classloaders. One consumer can process a message for any integrated application.

This is utility computing without expensive VMWare license fees.

We’re consolidating servers instead of giving each application dedicated hardware. The servers were mostly idle, anyway, which is why enterprises are looking to utility computing and virtualization to make more efficient use those spare CPU cycles. In our framework, hardware becomes generic processing power without the need for virtualizing servers. Scaling out the server farm benefits all applications equally, whereas the prior deployments required separate capital expenditures for each new server.

Pure POJO

Our framework runs inside an IDE without any server infrastructure at all. No ActiveMQ, no MySQL, and no Terracotta Server. Developers can stand up their own message bus in their IDE, post messages to it, and debug their message code right in the framework itself.

We introduce Terracotta Server as a configuration detail in a test environment. Configuration Management handles this part, leaving developers to focus on the business logic.

So, I might not be writing my own servlet container anytime soon (not when Tomcat and Jetty are open source and high quality), but I think it made a lot of sense to reinvent the “enterprise messaging” wheel. Terracotta Server allows me, in effect, to treat my network as one big JVM. My simple POJO model went wide as a configuration detail. That makes my bus (and TC) remarkably transparent.

What's next? CommentsLeave a comment Digg it Save This Page

9 Responses to “Some wheels need reinventing”

  1. Dan Says:

    It would be illuminating to get some more details on what you built. I’m guessing you can’t post the source, but perhaps a discussion of what “enterprise messaging” features you required, and what features common to JMS/ActiveMQ/whatever were not necessary and could be ignored for your application. Even better would be a high-level class or responsibility diagram.

  2. Jonas Bonér Says:

    Great post.
    You are really capturing the essence of, and the “feel” for, Terracotta here; it stays out of the way as long as it has to, and bringing it in is only a configuration detail.

  3. John Says:

    Sounds really interesting; I’m right with Dan — would love to see what you’ve achieved in the internal middleware space. Must take another look at TC.

    As for JMS/ActiveMQ/etc a compelling reason to use a middleware server is to connect to applications which are out of your control or written in (possibly unknown) alien technologies. Which is why something AMQP-based like Apache Qpid is interesting. AMQP about an effective protocol, not an over-complex-broker-for-my-app. With AMQP you can send a message from Python to Erlang to C#, never mind JMS or ActiveMQ! That’s when brokers are useful.

    Well done on the less is more front.

  4. ARI ZILKA Says:

    Mark,

    Again, its great to see your enthusiasm. Let’s try to talk some time soon. I want to help all these people understand what you have achieved. Like Dan says, there has to be a lower detail version of sharing what you have done than just open sourcing the source. For example, we could collaborate on a whitepaper that documents the architecture concepts, the interfaces, and the capabilities w/o the use case or the implementation.

    As an example, I am working with the kind folks at www.getabby.com who have implemented SQL on top of Amazon SDB and while they have not shared anything yet, we plan to get together and make something generically available to the market (documents and source) without revealing a thing about getabby.com and their usage of this technology.

    Very smart and lucid arguments as to why consider reinventing the wheel, BTW!

    Kewl,

    –Ari

  5. Enrique Says:

    Have you seen tha Cajo library? It allows to discover published services in other computers of the network, invoke methods… It even has a class that is able to provide a GUI that is deployed to a client web browser as an applet or a webstart app. All in a 54 Kb jar. I have tried only a few quite trivial examples (broadcasting custom events to other clients in the network) so I don’t really know whether it is efficent or not. But surely it is hard to compete with its size, and it is open source.

  6. Adam Malter Says:

    For a messaging bus to be ‘Enterprisey’ for me, it has to have the following features:

    1. Guaranteed message delivery
    2. No single point of failure
    3. Some sort of true transactional semantics (i.e. Deliver all these messages or none)
    4. High message count throughput
    5. Large message size support
    6. Management/Statistics collection

    If you can accomplish all this in 92k, I am 100% on board. Trust me, we are tired of paying MQSeries licensing fees, and worrying about the reliability of ActiveMQ is not good for my blood pressure :-)

    Like the post relational solutions that are now looking to replace RDBMS systems (see http://www.vldb2007.org/program/slides/s1150-stonebraker.pdf) - the enterprise message bus could really use an out of the box rethink. I applaud any efforts towards this and still find Terracotta an interesting product. But, where does it fit into a production environment. Last time I looked it seemed to only offer hot/cold multi-node capability. How do you scale things up and out, etc..

    Treating the network as a single VM is great, but hell, I worry about the data fidelity inside our single VM nodes as is. Additionally with our messaging bus I worry about network outages, VM bugs, memory limitations, disk space issues, the “oppps, I tripped over the power cable” factor, etc, etc, etc - With something like JMS (and specifically a JMS provider you can trust, but hopefully as more implementations mature, that will count everyone) I can be certain within a very high level of confidence that when I PUT, the message will come out the other end.. Be it sooner, or later. Maybe my thinking in all of this is just screwed so tightly to our current business needs and current toolchain.. I am willing to admit that.. Well, that’s why I am here, looking to you guys to think outside of all that, just don’t forgot about the basic needs in all the excitement.

    Really though, I love following this stuff and please keep up the interesting work..

  7. Mark Turansky Says:

    @Adam,

    You (again, like last time) make excellent points.

    You’re working in finance, which makes your requirements somewhat different than mine. For example, we don’t have a need for JTA and the like, which you mention in a previous post on my blog and again in your list above (#3).

    TC provides persisent storage of messages in their disk cache. We would probably need to build in a little more functionality to assure that messages are not only delivered but also processed successfully through our system. TC clusters maps, so putting a message into some kind of holding bay until taken out later (upon successful completion) seems like a ridiculously easy feature to implement.

    We’ve got multiple of everything. All consumers run in their own JVMs across the network. TC is the glue that binds them together. We’ve got multiple of each type of consumer.

    TC has a network high-availability config option. This may be new from when you last looked at it.

    We can put a lot of messages through our system, and processing those messages take longer than the overhead of bussing. But we’re in a different business that you are. Our processing is CPU intensive, so we’re limited by hardware. We can and will scale out our server farm horizontally to increase our throughput, but we’re also not writing a trading application. Our messages take several seconds to several minutes to process. We just need an easy way to distribute them all across the network for parallel processing. That’s what we built.

    My very first blog article is titled “Horses for courses.” I think you and I are running different races, so our choice of tools will, naturally, be different.

    I hope one day a fully open source message system will satisfy your reliability requirements. Until then, it sounds like IBM’s got the right product for you.

  8. Adam Malter Says:

    @Mark

    The reason I follow your blog (and Terracotta, and Gridgain, and ActiveMQ, and HardOop, and … ) - is that IBM’m may have the right product for us, but, I can not wait for the day when we can kick them, and all other providers that want to dictate scaling strategy with byzantine licensing and muck up development and debugging with closed source drivers.

    I come here not to point out that your falling short of some arbitrary standard, but to push us all ahead!

    Anyway, thanks for the interesting posts as usual Mark. I follow perhaps 30 Java blogs in my reader, but always seem to be commenting here. :-)

  9. Dan (Team Mark) Says:

    @Adam

    Number 6 in your enterprise-y features list, management/statistics collection, is something we’re still tinkering with. Turns out grid management of our nodes across multiple physical machines is not a trivial problem to solve elegantly… :)

    So far we’ve tried a few different implementations: Plain Java sockets, JMX, and a Terracotta based communications mechanism. Each has an important drawback which leaves a bad taste in my mouth. We’ve so far stuck with the TC based system because it doesn’t introduce a new moving part into our implementation and is sort of “the simplest thing which could possibly work.”

    Maybe Mark or I will elaborate on them sometime. It’s an interesting and important topic. The bus system isn’t of much use unless you can easily deploy and manage nodes in the cluster… I am curious to understand how other systems tackle the problem.

Leave a Reply