Archive for the 'Architecture' Category
07th Mar 2008
Scalability & High Availability with Terracotta Server
Our message bus will be deployed to production this month. We’re currently sailing through QA. Whatever bugs we’ve found have been in the business logic of the messages themselves (and assorted processing classes). Our infrastructure — the message bus backed by Terracotta — is strong.
SCALABILITY
People are asking questions about scalability. Quite frankly, I’m not worried about it.
Scalability is a function of architecture. If you get it right, you can scale easily with new hardware. We got it right. I can say that with confidence because we’ve load tested the hell out of it. We put 1.3 million real world messages through our bus in a weekend. That may or may not be high throughput for you and your business, but I guarantee you it is for our’s.
The messages we put through our bus take a fair amount of processing power. That means they take more time to produce their result than they do to route through our bus. How does that affect our server load? Terracotta sat idle most of the time. The box hosting TC is the beefiest one in our cluster. Two dual-core hyperthreaded procs, which look like 8 CPUs in htop. We figured we would need the most powerful server to host the brains of the bus. Turns out we were wrong, so we put some message consumers on the TC box, widening our cluster for greater throughput. Now the box is hard at work, but only because we put four message consumers on it.
When we slam our bus with simple messages (e.g, messages that add 1+1), we see TC hard at work. The CPUs light up and the bus is running as fast as it can. 1+1 doesn’t carry much overhead. It’s the perfect test to stress the interlocking components of our bus. You can’t get any faster than 1+1 messages. But when we switched to real world messages, our consumers took all the time, their CPUs hit the ceiling, and our bus was largely idle. The whole bus, not just TC. We’ve got consumers that perform logging and callbacks and other sundry functions. All of these are mostly idle when our message consumers process real world workloads.
We’ve got our test farm on 4 physical nodes, each running between 4 and 8 Java processes (our various consumers) for a total of 24 separate JVMs. All of these JVMs are consumers of queues, half of them are consumers of our main request queue that performs all the real work. The other half are web service endpoints, batch processors, loggers, callback consumers, etc. and each are redundant on different phsyical nodes. Because our message processing carries greater overhead than bussing, I know we can add dozens more consumers for greater throughput without unduly taxing Terracotta. If we hit a ceiling, we can very easily create another cluster and load balance between them. That’s how Google scales. They’ve got thousands of clusters in a data center. This is perfectly acceptable for our requirements. It may or may not be suitable for your’s.
You might be thinking that dozens of nodes isn’t a massive cluster, but our database would beg to differ. Once we launch our messaging system and start processing with it, we’ll begin to adversely impact our database. Scaling out that tier (more cheaply than buying new RAC nodes) is coming next. I hope we can scale our database as cheaply and easily as our message bus. That’ll enable us to grow our bus to hundreds of processors.
Like I said, I’m not worried about scaling our bus.
HIGH AVAILABILITY
I might not be worried about scalability, but I am worried about high availability. My company is currently migrating to two new data centers. One will be used for our production servers while the other is slated for User Acceptance Test and Disaster Recovery. That’s right, an entire data center for failover. High availability is very important for our business and any business bound by Service Level Agreements.
Terracotta Server has an Active-Passive over Network solution for high availability. There is also a shared disk solution, but the network option fits our needs well. Our two data centers are connected by a big fat pipe, and Terracotta Server can have N number of passive servers. That means we can have a redundant server in our production data center and another one across the wire in our DR data center. We’ve also got a SAN that replicates disks between data centers. We might go with the shared disk solution if we find it performs better.
Overall, though, it is more important for our business to get back online quickly than it is to perform at the nth degree of efficiency. Messaging, after all, doesn’t guarantee when your stuff gets run, just that it eventually runs. And if everything is asynchronous, then performance, too, is a secondary consideration to high availability.
CONCLUSION
If there’s one lesson to be learned through this blog article, it’s that one size does not fit all. Not all requirements are created equal. Our message bus is the right solution for our needs. Your mileage may vary. Some factors may outweigh others. For example, having a tight and tiny message bus that any developer can run in their IDE without a server (even without TC) is a great feature. No APIs lets us do that with Terracotta. You might have very different requirements than we do and find yourself with a very different solution.
Our message bus will be deployed to production this month. We’re currently sailing through QA. Whatever bugs we’ve found have been in the business logic of the messages themselves (and assorted processing classes). Our infrastructure — the message bus backed by Terracotta — is strong.
SCALABILITY
People are asking questions about scalability. Quite frankly, I’m not worried about it.
Scalability is a function of architecture. If you get it right, you can scale easily with new hardware. We got it right. I can say that with confidence because we’ve load tested the hell out of it. We put 1.3 million real world messages through our bus in a weekend. That may or may not be high throughput for you and your business, but I guarantee you it is for our’s.
The messages we put through our bus take a fair amount of processing power. That means they take more time to produce their result than they do to route through our bus. How does that affect our server load? Terracotta sat idle most of the time. The box hosting TC is the beefiest one in our cluster. Two dual-core hyperthreaded procs, which look like 8 CPUs in htop. We figured we would need the most powerful server to host the brains of the bus. Turns out we were wrong, so we put some message consumers on the TC box, widening our cluster for greater throughput. Now the box is hard at work, but only because we put four message consumers on it.
When we slam our bus with simple messages (e.g, messages that add 1+1), we see TC hard at work. The CPUs light up and the bus is running as fast as it can. 1+1 doesn’t carry much overhead. It’s the perfect test to stress the interlocking components of our bus. You can’t get any faster than 1+1 messages. But when we switched to real world messages, our consumers took all the time, their CPUs hit the ceiling, and our bus was largely idle. The whole bus, not just TC. We’ve got consumers that perform logging and callbacks and other sundry functions. All of these are mostly idle when our message consumers process real world workloads.
We’ve got our test farm on 4 physical nodes, each running between 4 and 8 Java processes (our various consumers) for a total of 24 separate JVMs. All of these JVMs are consumers of queues, half of them are consumers of our main request queue that performs all the real work. The other half are web service endpoints, batch processors, loggers, callback consumers, etc. and each are redundant on different phsyical nodes. Because our message processing carries greater overhead than bussing, I know we can add dozens more consumers for greater throughput without unduly taxing Terracotta. If we hit a ceiling, we can very easily create another cluster and load balance between them. That’s how Google scales. They’ve got thousands of clusters in a data center. This is perfectly acceptable for our requirements. It may or may not be suitable for your’s.
You might be thinking that dozens of nodes isn’t a massive cluster, but our database would beg to differ. Once we launch our messaging system and start processing with it, we’ll begin to adversely impact our database. Scaling out that tier (more cheaply than buying new RAC nodes) is coming next. I hope we can scale our database as cheaply and easily as our message bus. That’ll enable us to grow our bus to hundreds of processors.
Like I said, I’m not worried about scaling our bus.
HIGH AVAILABILITY
I might not be worried about scalability, but I am worried about high availability. My company is currently migrating to two new data centers. One will be used for our production servers while the other is slated for User Acceptance Test and Disaster Recovery. That’s right, an entire data center for failover. High availability is very important for our business and any business bound by Service Level Agreements.
Terracotta Server has an Active-Passive over Network solution for high availability. There is also a shared disk solution, but the network option fits our needs well. Our two data centers are connected by a big fat pipe, and Terracotta Server can have N number of passive servers. That means we can have a redundant server in our production data center and another one across the wire in our DR data center. We’ve also got a SAN that replicates disks between data centers. We might go with the shared disk solution if we find it performs better.
Overall, though, it is more important for our business to get back online quickly than it is to perform at the nth degree of efficiency. Messaging, after all, doesn’t guarantee when your stuff gets run, just that it eventually runs. And if everything is asynchronous, then performance, too, is a secondary consideration to high availability.
CONCLUSION
If there’s one lesson to be learned through this blog article, it’s that one size does not fit all. Not all requirements are created equal. Our message bus is the right solution for our needs. Your mileage may vary. Some factors may outweigh others. For example, having a tight and tiny message bus that any developer can run in their IDE without a server (even without TC) is a great feature. No APIs lets us do that with Terracotta. You might have very different requirements than we do and find yourself with a very different solution.
Posted by Mark Turansky under
Architecture, Engineering, Technology, Terracotta
3 Comments »
06th Mar 2008
InfoQ writes about my use of Terracotta Server as a message bus
Check out this article on InfoQ about using Terracotta Server as a message bus!
Check out this article on InfoQ about using Terracotta Server as a message bus!
Posted by Mark Turansky under
Architecture, Terracotta
No Comments »
21st Feb 2008
Some wheels need reinventing
Reinventing a square wheel is a common anti-pattern. The idea is a) we don’t need to reinvent the wheel because b) we’re likely to recreate it poorly compared to what is already available. But if we never reinvent any wheels, then we never progress beyond what we have. The real question, then, is when does it make sense to recreate a wheel? Some wheels need to be recreated.
I recently reinvented a wheel. A big one. The wheel is “Enterprise Messaging,” which much be complex because it has “enterprise” right in the name! I’d be a fool to reinvent that wheel, right? Maybe. Maybe not. We fit our “enterprise messaging system” into 92kb:

Some won’t consider 92kb to be “enterprisey” enough, but that’s ok with me. I know we were able to put 1.3 million real-world messages through our bus over a weekend. That’s enterprisey.
Jonas Bonér wrote an article about building a POJO datagrid using Terracotta Server, and I replied on his blog saying we did something similar by using Terracotta Server as a message bus. Another reader asked why I did this instead of using JMS.
I think there are several benefits to this reinvented wheel:
TINY!
92kb contains the entire server framework. We have another jar containing common interfaces we share with client applications that weighs in at 18kb.
It works!
A single “consumer” in our framework is bootstrapped into an isolated classloader, which enables our framework to load applications (the various apps we need to integrate) into their own isolated classloaders. One consumer can process a message for any integrated application.
This is utility computing without expensive VMWare license fees.
We’re consolidating servers instead of giving each application dedicated hardware. The servers were mostly idle, anyway, which is why enterprises are looking to utility computing and virtualization to make more efficient use those spare CPU cycles. In our framework, hardware becomes generic processing power without the need for virtualizing servers. Scaling out the server farm benefits all applications equally, whereas the prior deployments required separate capital expenditures for each new server.
Pure POJO
Our framework runs inside an IDE without any server infrastructure at all. No ActiveMQ, no MySQL, and no Terracotta Server. Developers can stand up their own message bus in their IDE, post messages to it, and debug their message code right in the framework itself.
We introduce Terracotta Server as a configuration detail in a test environment. Configuration Management handles this part, leaving developers to focus on the business logic.
So, I might not be writing my own servlet container anytime soon (not when Tomcat and Jetty are open source and high quality), but I think it made a lot of sense to reinvent the “enterprise messaging” wheel. Terracotta Server allows me, in effect, to treat my network as one big JVM. My simple POJO model went wide as a configuration detail. That makes my bus (and TC) remarkably transparent.
Reinventing a square wheel is a common anti-pattern. The idea is a) we don’t need to reinvent the wheel because b) we’re likely to recreate it poorly compared to what is already available. But if we never reinvent any wheels, then we never progress beyond what we have. The real question, then, is when does it make sense to recreate a wheel? Some wheels need to be recreated.
I recently reinvented a wheel. A big one. The wheel is “Enterprise Messaging,” which much be complex because it has “enterprise” right in the name! I’d be a fool to reinvent that wheel, right? Maybe. Maybe not. We fit our “enterprise messaging system” into 92kb:

Some won’t consider 92kb to be “enterprisey” enough, but that’s ok with me. I know we were able to put 1.3 million real-world messages through our bus over a weekend. That’s enterprisey.
Jonas Bonér wrote an article about building a POJO datagrid using Terracotta Server, and I replied on his blog saying we did something similar by using Terracotta Server as a message bus. Another reader asked why I did this instead of using JMS.
I think there are several benefits to this reinvented wheel:
TINY!
92kb contains the entire server framework. We have another jar containing common interfaces we share with client applications that weighs in at 18kb.
It works!
A single “consumer” in our framework is bootstrapped into an isolated classloader, which enables our framework to load applications (the various apps we need to integrate) into their own isolated classloaders. One consumer can process a message for any integrated application.
This is utility computing without expensive VMWare license fees.
We’re consolidating servers instead of giving each application dedicated hardware. The servers were mostly idle, anyway, which is why enterprises are looking to utility computing and virtualization to make more efficient use those spare CPU cycles. In our framework, hardware becomes generic processing power without the need for virtualizing servers. Scaling out the server farm benefits all applications equally, whereas the prior deployments required separate capital expenditures for each new server.
Pure POJO
Our framework runs inside an IDE without any server infrastructure at all. No ActiveMQ, no MySQL, and no Terracotta Server. Developers can stand up their own message bus in their IDE, post messages to it, and debug their message code right in the framework itself.
We introduce Terracotta Server as a configuration detail in a test environment. Configuration Management handles this part, leaving developers to focus on the business logic.
So, I might not be writing my own servlet container anytime soon (not when Tomcat and Jetty are open source and high quality), but I think it made a lot of sense to reinvent the “enterprise messaging” wheel. Terracotta Server allows me, in effect, to treat my network as one big JVM. My simple POJO model went wide as a configuration detail. That makes my bus (and TC) remarkably transparent.
Posted by Mark Turansky under
Architecture, Engineering, Technology, Terracotta
9 Comments »


