Archive for category Architecture

Tech predictions for 2008

Fancying myself as a wise prognosticator is fun, so I’ll lay down down a guess or two about technology in 2008.

OPEN TERRACOTTA

First, I am overwhelmed with ideas and potential uses for Open Terracotta as well as mystified and amazed by its ease of use.

I am a curmudgeon for most technologies. I’ve got a healthy skepticism for anything recommended by vendors. It almost always seems like a lot more than I need. I’m more than a little lazy. I want quick and easy, fast and simple. I’ve learned over the course of my career that simple isn’t easy. It takes good, smart work to make things simple.

For example, years ago, I avoided EJBs (and thus heavy weight app servers) just because they seemed like a really hard way of writing otherwise simple classes. I wanted to write simple classes and deploy to a simple container, so I rolled my own ORM and deployed all my apps on Tomcat. Today? Vindication! Rod Johnson and the folks who developed Spring deserve every bit of credit they get. They were equally frustrated with the vendor-driven solution and created a fantastic framework for POJO-based programming. Likewise, Hibernate — which Spring embraces beautifully — was a “roll your own” ORM that grew into one of best pieces in Java’s open source community. Add Java 5’s Generics to Spring and Hibernate3 and you’ve got all the tools you need to create a reusable framework for writing ultra-small, easily configured POJO DAOs and transactional Facades.

Back to Terracotta… Open Terracotta is clustering software. It is one of the finest uses of AOP I’ve seen. It invisibly and magically clusters your Java classes via configuration. You write your programs in simple POJO style, then declare what Terracotta should cluster. Developers can run small, simple unit tests for their work and let Configuration Managers handle the clustering.

What really got me was how easy it was to configure. The guys at Terracotta provide several excellent examples with the Terracotta distribution. Simplistic examples only go so far, of course, so they’ve also provided detailed error messages to guide you as you’re learning what goes into that magical config file.


Look! Useful error messages!
Terracotta Error Message

Terracotta is Free Open Source Software (FOSS), which is the best kind of software to facilitate widespread usage. It’s been open for a little over a year now. I predict good things in ‘08 for this software.

GRID COMPUTING

More cores on each chip and ever cheaper computers means we’ll have yet more computing power tomorrow than we did yesterday. My laptop has a dual core 2.4Ghz CPU, 4GB of RAM, and a 90GB drive. Install your favorite Linux distro and you’ve got a monster server compared to what was available 10 years ago. This type of machine is cheap, too, compared to a server from 10 years ago. That means you can buy a whole lot more of them.

What do we do with all this computer power? How do we harness it?

With enabling software like Terracotta, clustering becomes easy. You’ve still got to design your software to take advantage of parallelism, but the act of running programs in parallel is no longer difficult. This is what Fred Brooks means when he talks about the essential versus accidental complexity.

Distributing code and running massively parallel programs used to be difficult. It required complex architectures and expensive application servers. This is accidental complexity. Advances in software development — like Terracotta, GridGain, Spring, and other FOSS programs — dramatically reduce if not eliminate the accidental complexity of distributing your programs to a cluster of machines. The essential complexity is writing your program and designing it for parallelism from the start. What your software does will always be the hard part of writing software, which is why there really isn’t a Silver Bullet. Enabling technologies like Terracotta, however, makes it easier to move bits around. We’ll see more uses of “the grid” in 2008.

SUMMARY

Grids have been coming for a long time, and lots of work have been put into them such that it’s constantly getter easier to write for and deploy to a grid. Watch how Open Terracotta enables architects to design a grid in ways that we haven’t even imagined yet.

I think 2008 is going to be a very good year for architects and developers.

Performance != Scalability

Performance and scalability are not the same.

It is insufficient to run through a QA environment and say, “Everything seems fast enough!” You may load test the same QA environment and conclude that performance is good, so you’ll be able to scale. You may be wrong! Performance and scalability are two very different things. You can often get more scalability out of applications that perform well — they aren’t mutually exclusive — but you don’t actually need one to get the other.

Performance is often about algorithms, which may be fast but not scalable. Building a select box in a server page might be very, very fast when you only have 10 items to put in it, but it’ll be a dog if you try 1000. This happens all the time. In fact, it just happened where I work. The simplest solution for the search functionality at hand was to put all users of a specific type into a select box. It worked great for months when we didn’t have many users of that type. But the data grew… and one day it’s a major performance issue for a major customer of our’s. Oops! Joel Spolsky writes about Schlemiel that Painter’s Algorithm, which is an excellent anecdote about algorithms that seem fast, but can’t scale.

Scalability doesn’t require good performance. It helps, but it’s not a prerequisite. Consider a page or process that is slow, say 10 seconds response time. If that page is consistently 10 seconds for every box you throw at it, then you can say it is scalable, if inefficient. You’ve achieved linear growth with a known response time for average performance. You have a known, measured metric in hand with which to estimate hardware requirements to support some future concurrent usage.

Scaling is also a measurement of how your application performs when spread across dozens or even hundreds of nodes! Does your application use a “shared nothing” architecture? Are the nodes chatty, doing multicasts to the entire cluster? If one node talks to every other node in the cluster to perform some kind of update in real time, it won’t scale. You’ll eventually hit a number of nodes where you spend more time keeping the other nodes in sync than doing real work. This is the same reason most OSes can’t keep accurate time below 10ms. It would spend more time keeping time than doing real work.

I’ll close with an excellent visual of performance vs. scalability, given by John Engates, CTO of Rackspace. He recently gave a talk at a MySQL conference about scaling web applications, and I think these two picture are worth more than their two thousand words:

PERFORMANCE
ferrari.jpg

SCALABILITY
highway.jpg

Saving database cycles

I once read a great quote:

Oracle may scale technologically, but it doesn’t scale financially

We run Oracle in our shop, and we’ve got a big database. Hundreds of tables. Millions and millions of rows of data. Extensive and detailed audit records of every database transaction. We recently deployed Oracle RAC to help our data layer scale, and it works! We can separate web traffic from internal operations. We can target specific RAC nodes with heavy hitting reporting queries while leaving other nodes available to crunch jobs.

There’s only one problem… new nodes cost a lot of $$$$$.

So, what do you do when you see your fancy clustered database running at 50% of capacity? Beg the CFO to budget for a few more nodes next year? What if you get hit with a sudden burst of web traffic or increased demand for reports or you have to crunch umpteen thousand jobs by next weekend?

Really, the question is, how do you scale your database simply, easily, and cheaply?

Some would say to dump Oracle for a cheaper database. Google famously runs MySQL for AdWords. If it’s good enough for a multi-billion dollar business, it’s probably good enough for my needs. While I might agree with that assessment, I can also say unequivocally that our business cannot migrate off of Oracle in the near term. That would require a rewrite of major pieces of our business.

So, the question still… what do you do when you see your fancy clustered database running at 50% of capacity?

I’ve got a theory. Oracle CPU cycles are expensive. CPU cycles on commodity x86 hardware running commodity software (Linux, HTTPD, and Tomcat) are cheap. If we can move the cycles from Oracle to commodity boxen, we’d be using our database less, the average cost per CPU cycle drops, and we can scale further for less money.

An Oracle RAC cluster is a chatty cluster. One node is kept in sync by constant updates from the other nodes. This is in contrast to the “Master/Slave” configuration of other databases where there is a lag between updates as a node waits for log file shipments.

Chatty cluster, eh? Real time updates? Hmmmm….. I believe I can recreate this at the application level using commodity hardware and software. I believe that we can hold lots and lots and lots of data in memory in an application cluster, thereby reducing the number of queries we run against Oracle. I believe the application cluster can request data from another node in the cluster that’s faster then getting it from Oracle. After all, both are a network call, but one (the db) might also require IO, while the app cluster has it in memory.

I also believe we can test this whole theory and let the numbers guide our decision. We can run benchmarks against Oracle by scaling the number of clients we have connected to it. We’ll know that X number of app servers brings a single RAC node to 100% capacity. If we test this with linear growth of app servers, then we can do some simple math. We need Y RAC nodes to handle X app servers.

And then we benchmark the commodity application cluster. In theory, Oracle should now be used for lazy reads and writes. Y should be smaller than in our previous test, and X should be able to grow to a very large number. The larger X is, the more memory you have in the cluster and the less you rely on lazy reads from Oracle, keeping Y small.

That’s the theory. Ok, it’s not much of a theory. It’s also known as caching. In this case, it’s distributed caching. The concept is simple, but designing and deploying an elegant solution that scales is harder. Retrofitting a 5 year old organically grown enterprise application with a database that’s pushing 300gb in size to use distributed caching while remaining 100% backwards compatible is harder still. Caching is a good idea, but good ideas aren’t worth much by themselves. The plan to implement the idea and the skills required to deliver it are worth a lot more.

I will be laying out the architecture and benchmark results in future posts.

Tags: , , ,

Switch to our mobile site