I’ve been telling my friends this for years, so it’s good that I’ve finally got a credible website with evidence proving the theory. I will also try to help my friends if they ask “Have you seen my ________[fill in the blank]?” I’d ask in reply, “Where’s the last place you put it?” It’s always the last place you look!
Archive for June, 2008
A software development lifecycle that does not include design review early in the process is doomed to poor estimates, cost overruns, and a wildly inaccurate schedule.
Why? Let me tell you what just happened to me.
I picked up a task for a project manager because I had some time free and his resources were completely booked. It was a simple feature with a two day estimate and it was already scheduled for release without having gone through design review. Since it was scheduled, it had a code cutoff date. That was last Friday.
The feature was pretty easy to implement. I needed to add a column to a database table, add support for it in our system, create some services (as in SOA) to change this field, and include the field in our web UI. That’s it. One database column with support for it across our system. Not a hard task.
I implemented the feature within the original estimate, I checked my code into our version control system, signed off on the feature, and asked our Database Engineers (DBEs) to include the new column in our test environment. As far as I know, this was the first time a DBE had a chance to review the feature. They put my change on hold while they suggested moving the field to a different table.
The DBE has a good argument for the field being on the other database table. He may be right. The original requirements may have been good but not good enough. But the problem is this review happened after the entire implementation was said and done.
Changing where the column exists represents a 3X cost of the original feature. The first 1X was the original implementation. Should we choose to move the column, I have to undo the original work and then do it all over again for a different table. Even if undoing the original work isn’t a full X of cost, it is still work I have to do that was not part of the original estimate. Redoing all the work on the new table is a full X of additional cost. We’re at least 2X above the estimate.
A 30 minute design review with the appropriate people would have kept the cost to 1X and given us the right solution the first time. Instead, we’ve got a potentially sub-optimal 1X solution or a 3X correct solution. And this was a simple feature. Larger features with more complex requirements would incur significantly higher cost overruns if not properly designed up front.
Design reviews must be an early part of the process, not an afterthought. It is the only way to avoid 3X overruns.
Using the batched statement capability of your JDBC driver can give you 7-8X throughput gains. Not only is batching significantly faster, it’ll save database CPU cycles and be easier on the network, too.
The graph below shows elapsed time (in milliseconds) by batch size. For each data point, 1K rows were inserted into a simple table in MySQL. The benchmarking code I used can be found here.
Why is batching so much faster?
First, depending on how much PreparedStatement caching your driver is doing, your database may be spending a lot of time parsing and compiling statements. After the statement is parsed and compiled, bind variables are applied. In our example, the data base will parse and compile the statement once as opposed to 1,000 times. This reduces the work your database performs and saves CPU.
Second, all bind variables are passed to the database in a single network call instead of 1,000 separate out-of-process, across-the-network calls. This helps reduce network traffic.
Third, depending on the internal architecture of your code, single statements may return the connection to a pool after every use. Multiply that by 1,000 and run a profiler and you’ll see yourself calling take/put methods a lot. Many pools also verify the connection on check-in and check-out. “select 1 from dual” is a common check for a pool to use. Your 1,000 uses of a connection may also be incurring the cost of 2,000 “select 1 from dual” statements!
When should you use batching?
Batching is particularly useful in importing scenarios where you need to get lots of data into your application quickly, but it can be used even when executing a few similar statements. Check out the example source code provided to see if batching is right for you. Fiddle with the numbers to see the gains for batching just 10 similar statements. It may not be 8X big, but trumpeting 25% gains to management is still a win for you and your team.
Use JDBC Batching!
JDBC batching can give you dramatic throughput gains while simultaneously being less abusive to your hardware. Overall, if you have the opportunity to use batch inserts and updates, you should seize that opportunity. Look at your application’s internal architecture to see if batching is right for you.