Archive for the 'Engineering' Category

17th Jun 2008

How to incur 3X costs for 1X worth of functionality

A software development lifecycle that does not include design review early in the process is doomed to poor estimates, cost overruns, and a wildly inaccurate schedule.

Why? Let me tell you what just happened to me.

I picked up a task for a project manager because I had some time free and his resources were completely booked. It was a simple feature with a two day estimate and it was already scheduled for release without having gone through design review. Since it was scheduled, it had a code cutoff date. That was last Friday.

The feature was pretty easy to implement. I needed to add a column to a database table, add support for it in our system, create some services (as in SOA) to change this field, and include the field in our web UI. That’s it. One database column with support for it across our system. Not a hard task.

I implemented the feature within the original estimate, I checked my code into our version control system, signed off on the feature, and asked our Database Engineers (DBEs) to include the new column in our test environment. As far as I know, this was the first time a DBE had a chance to review the feature. They put my change on hold while they suggested moving the field to a different table.

The DBE has a good argument for the field being on the other database table. He may be right. The original requirements may have been good but not good enough. But the problem is this review happened after the entire implementation was said and done.

Changing where the column exists represents a 3X cost of the original feature. The first 1X was the original implementation. Should we choose to move the column, I have to undo the original work and then do it all over again for a different table. Even if undoing the original work isn’t a full X of cost, it is still work I have to do that was not part of the original estimate. Redoing all the work on the new table is a full X of additional cost. We’re at least 2X above the estimate.

A 30 minute design review with the appropriate people would have kept the cost to 1X and given us the right solution the first time. Instead, we’ve got a potentially sub-optimal 1X solution or a 3X correct solution. And this was a simple feature. Larger features with more complex requirements would incur significantly higher cost overruns if not properly designed up front.

Design reviews must be an early part of the process, not an afterthought. It is the only way to avoid 3X overruns.

Posted by Posted by Mark Turansky under Filed under Business, Engineering Comments 1 Comment »

03rd Jun 2008

HOW TO: Use JDBC Batching for 7-8X throughput gains

Using the batched statement capability of your JDBC driver can give you 7-8X throughput gains. Not only is batching significantly faster, it’ll save database CPU cycles and be easier on the network, too.

The graph below shows elapsed time (in milliseconds) by batch size. For each data point, 1K rows were inserted into a simple table in MySQL. The benchmarking code I used can be found here.

jdbc_batching_gains.png

Why is batching so much faster?

First, depending on how much PreparedStatement caching your driver is doing, your database may be spending a lot of time parsing and compiling statements. After the statement is parsed and compiled, bind variables are applied. In our example, the data base will parse and compile the statement once as opposed to 1,000 times. This reduces the work your database performs and saves CPU.

Second, all bind variables are passed to the database in a single network call instead of 1,000 separate out-of-process, across-the-network calls. This helps reduce network traffic.

Third, depending on the internal architecture of your code, single statements may return the connection to a pool after every use. Multiply that by 1,000 and run a profiler and you’ll see yourself calling take/put methods a lot. Many pools also verify the connection on check-in and check-out. “select 1 from dual” is a common check for a pool to use. Your 1,000 uses of a connection may also be incurring the cost of 2,000 “select 1 from dual” statements!

When should you use batching?

Batching is particularly useful in importing scenarios where you need to get lots of data into your application quickly, but it can be used even when executing a few similar statements. Check out the example source code provided to see if batching is right for you. Fiddle with the numbers to see the gains for batching just 10 similar statements. It may not be 8X big, but trumpeting 25% gains to management is still a win for you and your team.

Use JDBC Batching!

JDBC batching can give you dramatic throughput gains while simultaneously being less abusive to your hardware. Overall, if you have the opportunity to use batch inserts and updates, you should seize that opportunity. Look at your application’s internal architecture to see if batching is right for you.

Posted by Posted by Mark Turansky under Filed under Code Hints, Engineering, HOW TO Comments 3 Comments »

06th May 2008

More proof that you can’t keep a good idea down?

In this blog article, Michael Nygard discusses a talk he attended where a technical architect discussed an SOA framework at FIDUCIA IT AG, a company in the financial services industry. Nygard describes an architecture that echoes many of the features I implicitly spoke of in my first blog article about my big integration project / message bus.

You may be asking yourself right now, why does he keep talking about this particular project? Briefly: it’s been a very fun project, it’s ongoing, it consumes most of my daily brain cycles, we’re still growing it (it’s a brand new infrastructure for us), and it encompasses a whole lot of ideas that I thought were good and that are now being validated by other projects I read about online.

So, what other unsung features did we build in that I’ll now sing about?

Asynchronous Messaging

You’ll notice the Spooler component in the original broad sketch of our architecture. The high-level description I gave the Spooler touched on callbacks. Asynchronous messaging was left unsaid, but it is implied by having a mechanism for callbacks.

The description also labeled my Spooler an endpoint, which may be a web service endpoint. We use web services only because the Enterprise Service Bus (ESB) orchestrating work on our bus is .NET-based while our project is all Java. That said, we post Plain Ol’ XML (POX) over HTTP, which is deserialized quickly to a Java POJO. Our entire messaging system works on POJOs, not XML.

The outside world may use SOAP (or XML-RPC or flat files or whatever) when communicating with my company, but internally our ESB talks POX with the bus. Mediation and transformation (from SOAP –> POX) is part of the functionality of an ESB. Consumers, internally to our bus, would directly access queues instead of using web services.

Pure POJOs, but distributed

It’s extremely productive and useful to work with a pure POJO model, and it’s even more productive and useful when the state of those POJOs is automagically kept in sync across the cluster regardless of what node is working on it. This is where Terracotta Server shines.

We pass POJOs around through all the queues. Consumers — which can exist anywhere on the network — process the Service/Job/Message (all interchangeable terms, as far as I am concerned — they are all units of work). Our messages are stateful, meaning they enter our bus empty except for parameters in instance variables, get routed around to various and sundry consumers across the network, and get posted back (the callback) full of data to the ESB.

Why do we need distributed POJOs? Well, we found it to be highly useful. For example, we offer a REST API to abort a pending message (such as http://ourendpoint/message/abort/abcdefg-the-guid-wxyz). The easiest way we found to tell the entire bus to disregard this message was to flip the bit on the message itself. The endpoint is running under Terracotta Server, all of the queues live in TC, and our consumers are likewise plugged in. If you stick all your messages in a Map (or series of maps if you’re worried about hashing, locking, and high volumes) where the GUID is the key and the value is the message, then the endpoint or any consumer can quickly obtain the reference to the message itself and alter its state. We can also write programs that hook into TC temporarily to inspect or modify the state of the system. Persistent memory is cool like that. It exists outside the runtime duration of the ephemeral program.

The endpoint likewise has REST APIs for returning the state of the bus, queues sizes, current activity, and other metrics. All of this data is collected from the POJOs themselves, because the endpoint has access to the very object instances that are running all over the network. It just so happens this architecture works wonderfully inside a single JVM, too, without TC, for easier development and debugging.

Load balancing and routers

Straight from Michael Nygard’s article:

Third, they’ve build a multi-layered middle tier. Incoming requests first hit a pair of “Central Process Servers” which inspect the request. Requests are dispatched to individual “portals” based on their customer ID.

In other words, they have endpoints behind load balancers (we use Pound) and “dispatched” is another word for “routed.” We have content based routers (a common and useful Enterprise integration Pattern for messaging systems) that route messages/services/jobs of specific types to certain queues. Our consumers are not homogenous. We’ve configured different applications (the integration aspects of our project) to listen on different queues. This saved us from having to port applications off the servers where they were previously deployed. These apps are several years old. Porting would have taken time and money. Allowing messages to flow to them where they already exist was a big win for us.

More to come

I’ve got the outline for my white paper complete, where I bulleted the features above as well as those in my previous blog article. There are other features I haven’t covered yet. Overall, I think it will be an interesting paper to read.

Still, I’m a little jealous, though, that FIDUCIA IT AG has scaled out to 1,000 nodes in their system. I can’t say how many nodes we’re up to, but I can say I’m looking forward to the massive scalability that our new architecture will give us.

Posted by Posted by Mark Turansky under Filed under Architecture, Engineering, Technology, Terracotta Comments No Comments »