Archive for January, 2008

The Accidental, The Essential, and Corporate Earnings

Fred Brooks famously wrote about accidental vs. essential complexity in his seminal essay No Silver Bullet. If you’ve not read it, I think you owe it to yourself to borrow someone’s copy of the Mythical Man Month and check it out. I believe No Silver Bullet provides a corollary to business and corporate quarterly earnings.

Briefly, Brooks asserts that accidental complexity is the “how” part of programming, such as how we manage memory, parse text, or use editors to write code. Garbage collected languages help manage memory, excellent libraries have emerged to help us parse text (like RegEx), and IDEs get better every single year with support for code completion, intellisense, refactoring tools, and even support for dynamic languages!

Each of these improvements, while welcome, only address the construction process of building software. None of the examples mentioned help attack the “essential” complexity of solving the problem at hand. No amount of tooling will help organizations better understand the problem, increase quality, improve poor designs, or otherwise reduce the complexity of the problem itself. Software construction — the act of actually writing software — is a small fraction of the cost and schedule, while requirements analysis, specification, design, and thorough testing for software fidelity represent the large majority of cost and time when building a product. No tools exist to slay this werewolf. Writing software is hard.

I’d like to posit that business governance is no different. Within a business, there are accidental and essential problems to solve. Both affect the bottom line, but the former will not provide an order of magnitude growth and the latter will always be a hard problem to solve. Wall Street likes to see companies constantly growing to justify any premium or P/E ratio given to the stock.

Accidental complexities in a business are the expenses incurred by normal operations. Payroll is the largest expense an employer has. Facilities and maintenance is another substantial portion of expenses, as is the cost of goods sold for those manufacturing products.

Organizations can increase quarterly earnings by tackling each of these accidental complexities. A business can find the optimal headcount to keep payroll costs as low as possible. Companies can tap into new green building techniques that put gardens on rooftops which help keep the building cooler, thus reducing the business’ energy costs (with a win/win scenario of helping the environment). A business can find a new paper supplier to reduce the cost of stationery or switch to new technologies to remove the paper trail entirely. Programs like Six Sigma help a business achieve higher quality on manufactured goods, reduce costs, and operate at higher levels of efficiency. Highly effective COOs are worth their weight in gold, but there is a fundamental limitation to each of these examples. All of them tackle accidental complexity, and none of them address the essential problem of growing the business itself.

There is no single expense reduction, no single improvement to operations that will grow a business by an order of magnitude, thus making it accidental complexity. Quarterly earnings improvements provide a temporary boost to the company’s stock price, but is quickly forgotten by Wall Street as they look to the next quarter.

Some companies achieve astronomical top line growth for periods of time. Their stocks may double, triple, or even quadruple in value in a year or two. These are exceptional growth rates for exceptional companies. They are not the norm, nor do they represent an order of magnitude in growth. Eventually, all growth companies slow down as the market changes from a growth industry to a mature one. Just take a look at Microsoft. They post excellent numbers every quarter with consistently growing earnings that are predicted, telegraphed, and boring to analysts. It’s a big, mature company in a huge, mature industry. Today, Microsoft struggles in every market they enter that’s not part of their desktop/office business.

The hard problem of growing a business remains. There is no silver bullet.

The Perils of Joel Spolsky

The Perils of Java Schools? Joel Spolsky — of Joel On Software fame — continues to ding Java whenever the opportunity arises, which just so happens (again) to be a recent article on his blog about “Java schools” and undergraduate programs. I think he still holds MSFT stock in his portfolio, which may explain the constant FUD coming from his bully pulpit.

I like Joel. I enjoy his articles and insight. Normally, I agree with him or learn something, but sometimes….!

Joel’s original article bemoaned the state of Computer Science curriculum in today’s universities, but what gets me (and he may be trolling, really, just to promote his products) is the continuing attacks on Java as a language and platform despite the fact that his former employer is following the exact same roadmap in order to claim Java’s marketshare. This does not change his choice in platform, naturally. He’s MSFT all the way, with the inclusion of *nix by writing his own language called ‘wasabi’ to crank out PHP code from a vbscribt-looking language. They must be bored over there at Fog Creek.

The Physicist, the Architect, and NASCAR

Physicists can tell you with great accuracy why things fall, why bodies of water ebb and flow, what makes up stardust, and why weight distributed across various building materials would support more or less weight than other materials in myriad configurations. There are formulas to calculate these things to the nth degree.

Architects will say they need a floor to support X people and design their building accordingly. Structural engineers make it happen.

I’d argue that one is science while the other is applied science. One is research and learning, while the other is a pragmatic use of today’s knowledge.

Architects do not need advanced degrees in physics to design a building. They need to get the job done on time and on budget. Most businesses don’t need computer scientists who understand relational algebra at a deep level. They just need pragmatic application developers who understand that relational algebra exists and that it underlies how modern databases are built. Knowing about relational theory at a superficial level is sufficient to take advantage of it by applying the right level of normalization on a database design. Folks who build NASCAR cars don’t need to be able to design next generation engines, they just need to be able to put horsepower under the hood.

Today’s curriculum

Colleges today are changing their curriculum to match the demands of the business world. There is is a schism forming between computer science and application development (for lack of a better term). We see this here in a local university from the eyes of a professor who consults with our company. One is science and research, the other is applied science to achieve business goals. The two don’t necessarily align, but we only have a single “computer science” degree that most closely matches what businesses need today.

I don’t know if Joel would support learning software engineering best practices in a computer science curriculum. His excellent “Joel Test” for software organizations may not have any place in today’s comp sci classrooms, but it’s still an excellent test. How do we teach people the merits of version control systems, build and smoke tests, and planning and scheduling? How do we teach good design with an eye towards maintainability? How do we teach young programmers that the project is not done when the coding is finished and that non-functional requirements will lengthen the schedule considerably? How do we write code that simultaneously meets the oft conflicting requirements and schedule pressure?

Does pedigree matter?

My daughter and I watched Underdog this past weekend. She held a huge bowl of popcorn in her lap while enjoying a canine protagonist that looks suspiciously like our beagle. We popped the popcorn in our microwave. Percy Spencer was a self-taught engineer working for Raytheon when he discovered microwave radiation. He didn’t get a comp sci degree from Yale. He didn’t even go to a Java school. He learned on his own. A childhood friend of mine is blowing up buildings in Connecticut as the Director of Technology for an NBC owned television station. He’s building a next generation studio and control center for NBC. Before that, he was the technical manager of the Today Show. He went to school to learn how to operate a camera. You know, a Java school for TV.

What most of us do everyday in the trenches isn’t rocket science. It isn’t even computer science. It’s application development where ingenuity, passion, and creativity drive the best of us, not pieces of paper.

Terracotta Server as a Message Bus

Terracotta is excellent software to glue messaging components together. This article is a high-level view of how we used TC to create our own messaging backbone.

Just a few weeks ago I made two predictions for 2008, but both centered around Terracotta. Since that time, I’ve gone deeper into the server and used it to write a message bus for a non-trivial integration project.

I’m impressed.

Our first implementation used a MySQL database for a single queue. JTA transactions running “select for update” statements against InnoDB worked just fine, actually, but there were other clunky things about that implementation. All roads looked like they led to queuing and routing. In a nutshell: enterprise messaging with multiple queues, not just batch jobs on a single queue.

Our second implementation (I believe strongly in prototyping, a la Fred Brooks “Plan to throw one away”) used JMS. Early in our design process, we talked about implementing our own messaging system using TC. We managed to talk ourselves out of it because a) no one else that we know of has done it and b) ActiveMQ is also open source, mature, and Camel looked very cool insofar as they give you a small domain specific language for routing rules between queues. The Camel project claims to have implemented all the patterns in EIP.

Well, we managed to deadlock ActiveMQ with multiple clients running with Spring’s JmsTemplate. Our request queue would just stop. We’d get an error saying our message couldn’t be found and the queue would simply stall. We couldn’t restart it without bouncing ActiveMQ. New clients all blocked on the queue. ActiveMQ did not survive our load test well. When we inquired, we were told about an know problem between Spring and ActiveMQ and that we should use the latest snapshot.

DISCLAIMER: I understand the preceding paragraph is entirely FUD unless I provide tangible evidence otherwise. We’ve since moved on from that implementation and removed all the JmsTemplates from our Spring apps. I won’t be providing screenshots or sample code to deadlock their server. To be fair, we did not choose to try again with another FOSS JMS queue, like JBoss. Our configuration of ActiveMQ and our Spring JmsTemplate clients may have been wrong. Feel free to take my criticism above with the proverbial grain of salt.

Happily, my team understands good design and the value of clean interfaces. All JMS-related code was hidden by handler/listener interfaces. Our consumer logic did not know where the messages (our own domain objects) came from. Implementations of the handlers and listeners were injected by Spring. As a result, it took just 90 minutes to swap in a crude but effective queueing and routing system using Terracotta. We’ve since cleaned it up, made it robust, added functionality for business visibility, and load tested the hell out of it. It all works beautifully.

Here are the main ingredients you need to roll your own message bus with Terracotta:

  1. Knowledge of Java 5’s Concurrent API for queueing
  2. Java’s Concurrent API expertly handles nearly all of your threading issues. Bounded LinkedBlockingQueues (also ArrayBlockingQueues) will neatly throttle your entire system for you. Consumers live in their own threads (and through the magic of Terracotta they can live in their own JVMs!) and can safely remove the next item from the queue, optionally waiting for a period of time for something to become available. Producers can add messages to a BlockingQueue in a thread-safe way, also optionally waiting for space to become available.

  3. Knowledge of Java threading for consumers and producers
  4. You’ll need to be able to start and stop your own threads in order to create producers and consumers.

  5. Daemon Runners
  6. Daemon Runners (my term for them, a better one may already exist) are long running POJO Java processes that you can cleanly shutdown later. Browsing Tomcat’s source code taught me a neat trick for hooking into a running JVM. Write a main program which spawns a thread that runs your actual application. Have the main thread open a ServerSocket and await a connection. When a token such as “stop” comes through, main stops its child thread and your application can exit gracefully. Anything else over the socket can be ignored, which lets your ServerSocket go right back to listening. We implemented a “gc” command, among others, to provide simple but effective hooks into our running processes anywhere on the network. You just need the IP and Port. You can optionally put IP checks into your daemon runner to validate that the IP sending the token is a trusted one. Our runners only accept tokens from 127.0.0.1. SSH lets us run scripts from across the network.

  7. Named classloaders
  8. Named classloaders is a TC trick needed to run multiple stand-alone Spring applications yet have them share the same clustered data. TC ties applications together using specific names for classloaders. Modules they’ve built already know how to cluster Tomcat Spring applications, for example, because the classloaders are the same every time. In standalone apps, you’re not guaranteed that the system classloader even has a name, let alone the same name across JVMs. See this post on TC’s forums to make a named classloader. It wasn’t hard. There may be another way to cluster standalone Spring apps. The named classloader did the trick for us. You will need to bootstrap your application to make this work. You should probably be doing this anyway.

  9. Spooler
  10. A Spooler lets your messaging system accept messages long after the rest of the queues get throttled by a bounded BlockingQueue. Your Spooler is an endpoint (maybe a web service endpoint) that will put everything it receives into an unbounded queue: your spool. A Spool consumer will read from the spool and forward to the next queue. Because the next queue is bounded, you’ve achieved throttling. You may have other components in your messaging system that require spooling. For example, we’ve got a consumer that performs callbacks and posts the results of the message to the callback URL. What happens if the callback endpoint is down? We don’t want our throttled message system to stop processing messages, so we spooled messages going into the callback queue.

  11. Consumer interface
  12. You’ll need to create a class or two around queue consumption. Our first crude implementation simply injected the queue itself into the listening thread. The listening thread blocks/waits on the blocking queue (hence the name!) until something is available. We’ve refined that a bit so that we now have listener classes that monitor the queues and pass the messages to consumer classes. The business logic is pure POJO Java logic, which is easily unit testable. This is, in essence, an event-driven system where your POJO class accepts events (messages) but doesn’t know or care where it came from. You want to decouple the business logic from the plumbing.

  13. Terracotta Server — messaging backbone & glue
  14. Last but not least, you need some queues, you need multi-JVM consumers, you need persistent data (a message store) that won’t get wiped out with a catastrophic failure, you need business visibility to track health and status of all queues and consumers, and you need to glue them all together. Terracotta Server handles these requirements very well.

TC really came through for us. We were curious about some of its behavior in a clustered environment. We made some assumptions about its behavior based on what would be ideal for minimizing network chatter and limiting heap size. TC nailed every single one of our assumptions.

We made the following assumptions and were happy to find out that all held up under load testing:

  1. L1 clients that were write-only wouldn’t ever need to have the entire clustered/shared dataset faulted to its heap. If you’re not going to read it, you don’t need it locally.
  2. Clustered ConcurrentMaps have their keys faulted to all L1 clients, but values are retrieved lazily.
  3. Reading from a BlockingQueue would fault just one object to the L1 client, instead of faulting in the entire queue, because the single object is retrieved in a TC transaction.
  4. TC and our unbounded spools wouldn’t run out of memory because TC pages object graphs to disk. Our unbounded L1 clients would work within an acceptable memory band.
  5. We can add/remove consumers to any point in our messaging system without affecting the entire system.

We’ve got our canaries in the coal mine, so we see what the entire system is doing in real time. We’re happy to see that our memory bands are predictable and that we’re entirely CPU bound. This is excellent for horizontal scalability. We can simply throw more processors at any part of our system to scale out. It doesn’t look like Terracotta server will be a bottleneck because the messages we’re processing take significantly more time to crunch than it takes to route through our queues. We have enough juice on our TC box to handle dozens more consumers across the network, which would give us significant throughput gains. We can revisit this when we have the need for hundreds of consumers. I’ll assume TC server will scale up with us, but if it can’t for any reason, it is perfectly acceptable to have more than one messaging cluster. That’s how Google scales. There are lots and lots of clusters in Google’s datacenters. Bridging between two messaging systems is a solved problem. That’s what messaging is, after all, a connection between disparate systems.

What did we gain?

Initially, we had MySQL. Then we added ActiveMQ, which is backed by MySQL. We saw how TC server would be beneficial if only to cluster POJOs that gather runtime data, so we had TC server in the mix. That’s three different servers in our system all of which needed high availability and routine backups. All were configured in Spring, making our dependency injection a maze to follow through.

When we switched to a TC message bus, we got rid of 2/3 of the infrastructure and most of the Spring configurations. We now have just one piece of infrastructure to maintain in a highly available way.

But I’m a guy that really likes simple. TC lets us make an entirely POJO system that runs beautifully in IntelliJ. A single “container” type main program can run all our components in a single JVM simply by loading all our various Spring configs. Developers can run the entire messaging system on their desktop, in their IDE, and run their code against it. They can post messages to an endpoint listening on 127.0.0.1 and debug their message code inside the messaging system itself.

We replace our container main with Terracotta in our integration and test environments. TC seamlessly and invisibly wires together all the components of the system, irrespective of where they live on the network. The POJO model goes wide with Terracotta server. It’s elegant, simple, and Just Worksâ„¢.

Best. Inbox. Ever.

You can search the world over and I think you’ll be pressed to find an inbox as full as this one is… with unread messages. My friend and coworker has successfully managed to ignore over half of his email traffic over the past two years. I think it’s quite impressive, though he tells me it requires very little actual effort.

Here’s a screenshot of his inbox with names and content blacked out for confidentiality:

inbox.jpg

Printable Design Patterns Quick Reference Cards

The Gang of Four design patterns have been elegantly distilled into a quick reference guide suitable for printing on 8.5 x 11.

You can get a larger version for your office wall, too. Check out the poster size. It’s perfect for any software organization.

I’ve posted low-resolution versions of the two cards here with the author’s permission. Links to the high-res printable versions are below on the author’s website.


designpatterns2_sm.jpg designpatterns1_sm.jpg

Jason McDonald created these high-quality reference cards. Click here to view the printable high-resolution images.

Canaries in the coal mine

Application logging is only as useful as your plan to actually use the logs. Without a plan to mine the data, collect metrics, and plot graphs, your logs are useless. It’s snowcrash in a console window. It’s gigs of spam in a file.

This reminds me of the Philosophy and Zen of Unix:

Rule of Silence: When a program has nothing surprising to say, it should say nothing.

But how do you know your program is running? You’ve got several options available to you, all good, and you should probably implement them all.

Canaries in the coal mine

In the good ol’ days, miners had a crude but effective way to test a mine shaft for adequate levels of oxygen: if the bird died, miners got out of the shaft.

Your program needs a canary in the coal mine. You need a way to smoke test your application when it first boots up and while it’s running. It either works or it doesn’t. The bird is dead, dying, or singing.

What kind of canary? One that tests some discrete bit of functionality of your application. You can use a simple site monitoring program with basic tests baked into a server page. You can run a load testing tool like JMeter to script mimicking what an end user would do. Just run a test of 1. In the messaging application I’m currently building, we send periodic test messages to the queues. The messages aren’t fancy, just tiny XML messages posted from a Python client requesting 2+2.

But 2+2 is important. It’s like your first Hello, World! program in a new language. Getting 2+2 running means you’ve successfully setup your environment, you understand the basics of compilation, packaging, deployment, and configuration management. You’ve also got your first benchmark of how quickly a simple message can pass through your system.

You want to log the data from the canaries in the coal mine. Ping your canaries every ten minutes. Keep those results and metrics. Create a plan to report on them, which puts you on the path of Statistical Process Control.

Statistical Process Control

You can spend a decade trying to attain CMM Level 5 accreditation or gaining your black belt in Six Sigma and probably still never completely grok the enormity of Statistical Process Control. You can, however, start improving your technical operations by using meaningful statistics to smooth out your Configuration Management practices.

So what is Statistical Process Control? Quoting Wikipedia:

Statistical Process Control (SPC) is an effective method of monitoring a process through the use of control charts. Much of its power lies in the ability to monitor both process centre and its variation about that centre. By collecting data from samples at various points within the process, variations in the process that may affect the quality of the end product or service can be detected and corrected, thus reducing waste and as well as the likelihood that problems will be passed on to the customer. With its emphasis on early detection and prevention of problems, SPC has a distinct advantage over quality methods, such as inspection, that apply resources to detecting and correcting problems in the end product or service.

In addition to reducing waste, SPC can lead to a reduction in the time required to produce the product or service from end to end. This is partially due to a diminished likelihood that the final product will have to be reworked, but it may also result from using SPC data to identify bottlenecks, wait times, and other sources of delays within the process. Process cycle time reductions coupled with improvements in yield have made SPC a valuable tool from both a cost reduction and a customer satisfaction standpoint.

In layman’s terms, you aggregate the data from your canaries into a graph. You watch the graph every day to eventually find your “center”, the normal singing voice of your canary. Your data tells you he sings at X decibels when healthy, and your graphs show you when there’s not enough oxygen in the mineshaft.

This entire process should be automated! If it’s not, you won’t do it. Your Ops center and CM folks should have at least one box set aside for automation and monitoring. Maybe it’s your build box. Put all your scripts there. Create cron jobs or Windows scheduled tasks to constantly parse your log files for data. Use Log4J’s JMS or JDBC Appender if you don’t want to parse text files. Get all your data in one place, mine it, and graph it.

Test Driven Deployment

Everyone knows about Test Driven Development, where you write your test code before your write your business logic. It forces you to actually design your code by making you interact with the class/object early in the process. Many preach TDD, some actually practice it.

I’m not personally aware of many architects or organizations that practice what I call Test Driven Deployment. This is the habit of understanding what your canaries are before you write your application. You will change how you architect, design, and deploy your software if you understand up front what data you want to capture and how you’ll access it. It forces you to design your solution before you try to implement it, just like Test Driven Development.

Pipe [debug] and [info] level logging to /dev/null

Divide your logging output into discrete files with meaningful names. Canaries and metrics can go to one set of files. Application errors and contextual information to help diagnose bugs should go to another file. Debugging output goes right to the black hole.

Make your log files useful. Practice Test Driven Deployment. Bring a canary with you down into the coal mine, and listen when he stops singing.

The End of the Mouse

Robert X. Cringley — far and away the best thing related to high technology coming out of Charleston, South Carolina — suggested in his annual tech predictions that Apple would deliver a replacement for the mouse in 2008. Here we are just two weeks later, mere days after MacWorld 2008, and the world has the new MacBook Air with a touchscreen built into the laptop!

I’m not sure if it’s too early to claim victory on that prediction, but Bob must be looking closely at the new touchscreen. So is Steve Ballmer. It won’t be long before Microsoft announces they are working with hardware manufacturers to create the best touchscreen UI ever which will trump any competing product, integrate seamlessly with spellchecking in Office, and own this new hardware market segment. It’ll be delivered in 2009, baked into the next OS, and be integrated with the next generation of Zune. Yawn.

HOW TO: Download & sort pictures from your camera using Python

I’ve got tens of thousands of photos. When I last checked, the size on disk was over 25gb. And why not? Film is free!

How do I keep track of them all?

First, there’s Picasa from Google. It’s awesome. It’s the iTunes of photos.

Second, there’s a little Python script I called “DownloadPhotosFromCameraAndSort.py” (It’s .txt on the server, rename to .py if you download it).

If you couldn’t tell, I like meaningful names for my scripts. Likewise for my Java classes, components, projects, etc. Good code communicates it’s purpose clearly without too much human parsing.

The script does the following:

  1. Find my camera
  2. Download all pictures/movies to 1 or more destinations
  3. Confirm the download before deleting the picture from the camera
  4. Sort the pictures into a dated subdirectory based on the individual picture’s last modified time

The dated directories give me a running chronology of my life and the lives of my wife and daughter. It’s not a fancy archival system, but it’s simple and easy for me. It may work for you, too.

The archive looks like this:


photosdir.jpg

HOW TO: Bootstrap Java programs in isolated classloaders

Bootstrapping is the process by which you load a very small and very simple pure java program with no dependencies that, in turn, loads, configures, and runs more complex programs with varying dependencies. Bootstrapping lets you run your container without polluting the system classpath. This allows you to run your deployed applications with the unpolluted system classpath as its parent. You’ve achieved classloader isolation.

When would you want to bootstrap? Any time you want an unpolluted system classpath, which I’m finding is often convenient.

Let’s say you want to write some kind of middleware product, a container of some sort that deploys other applications within it. You will run into classloading issues. The dependencies that your container has (say, Spring 2.0.6) may not be what your deployed application requires (maybe, Spring 1.2.6). You will find that you cannot have commons-logging in both applications (container and child). There are many ways to encounter java.lang.LinkageErrors. It’s very easy to cross the streams when running in a mutli-app environment.

What you want to do is load your container and deployed apps in splendid isolation from each other. How do you do that? Bootstrapping!

Here’s how you bootstrap…


import java.io.File;
import java.lang.reflect.Method;
import java.net.URL;
import java.net.URLClassLoader;
import java.util.ArrayList;
import java.util.List;

public class Bootstrap {

    public static void main(String[] args) throws Exception {

        /*
            Assume your application has a "home" directory
            with /classes and /lib beneath it, where you can put
            loose files and jars.

            Thus,

            /usr/local/src/APP
            /usr/local/src/APP/classes
            /usr/local/src/APP/lib
         */

        String HOME = "/usr/local/src/YOURAPP";
        String CLASSES = HOME + "/classes";
        String LIB = HOME + "/lib";

        // add the classes dir and each jar in lib to a List of URLs.
        List urls = new ArrayList();
        urls.add(new File(CLASSES).toURL());
        for (File f : new File(LIB).listFiles()) {
            urls.add(f.toURL());
        }

        // feed your URLs to a URLClassLoader!
        ClassLoader classloader =
                new URLClassLoader(
                        urls.toArray(new URL[0]),
                        ClassLoader.getSystemClassLoader().getParent());

        // relative to that classloader, find the main class
        // you want to bootstrap, which is the first cmd line arg
        Class mainClass = classloader.loadClass(args[0]);
        Method main = mainClass.getMethod("main",
                new Class[]{args.getClass()});

        // well-behaved Java packages work relative to the
        // context classloader.  Others don't (like commons-logging)
        Thread.currentThread().setContextClassLoader(classloader);

        // you want to prune the first arg because its your main class.
        // you want to pass the remaining args as the "real" args to your main
        String[] nextArgs = new String[args.length - 1];
        System.arraycopy(args, 1, nextArgs, 0, nextArgs.length);
        main.invoke(null, new Object[] { nextArgs });
    }

}

You can try this code out for yourself. Cut & paste the bootstrap code above into your favorite IDE, put that single Bootstrap.class onto your classpath, and run it like so:


java -cp . Bootstrap sample.HelloWorldMain Hello!
.

Click here to download the sample /usr/local/src/YOURAPP application.
Tip for Windows users, you can make the path c:\usr\local\src\YOURAPP it’ll work.

Tags: , ,

Tech predictions for 2008

Fancying myself as a wise prognosticator is fun, so I’ll lay down down a guess or two about technology in 2008.

OPEN TERRACOTTA

First, I am overwhelmed with ideas and potential uses for Open Terracotta as well as mystified and amazed by its ease of use.

I am a curmudgeon for most technologies. I’ve got a healthy skepticism for anything recommended by vendors. It almost always seems like a lot more than I need. I’m more than a little lazy. I want quick and easy, fast and simple. I’ve learned over the course of my career that simple isn’t easy. It takes good, smart work to make things simple.

For example, years ago, I avoided EJBs (and thus heavy weight app servers) just because they seemed like a really hard way of writing otherwise simple classes. I wanted to write simple classes and deploy to a simple container, so I rolled my own ORM and deployed all my apps on Tomcat. Today? Vindication! Rod Johnson and the folks who developed Spring deserve every bit of credit they get. They were equally frustrated with the vendor-driven solution and created a fantastic framework for POJO-based programming. Likewise, Hibernate — which Spring embraces beautifully — was a “roll your own” ORM that grew into one of best pieces in Java’s open source community. Add Java 5’s Generics to Spring and Hibernate3 and you’ve got all the tools you need to create a reusable framework for writing ultra-small, easily configured POJO DAOs and transactional Facades.

Back to Terracotta… Open Terracotta is clustering software. It is one of the finest uses of AOP I’ve seen. It invisibly and magically clusters your Java classes via configuration. You write your programs in simple POJO style, then declare what Terracotta should cluster. Developers can run small, simple unit tests for their work and let Configuration Managers handle the clustering.

What really got me was how easy it was to configure. The guys at Terracotta provide several excellent examples with the Terracotta distribution. Simplistic examples only go so far, of course, so they’ve also provided detailed error messages to guide you as you’re learning what goes into that magical config file.


Look! Useful error messages!
Terracotta Error Message

Terracotta is Free Open Source Software (FOSS), which is the best kind of software to facilitate widespread usage. It’s been open for a little over a year now. I predict good things in ‘08 for this software.

GRID COMPUTING

More cores on each chip and ever cheaper computers means we’ll have yet more computing power tomorrow than we did yesterday. My laptop has a dual core 2.4Ghz CPU, 4GB of RAM, and a 90GB drive. Install your favorite Linux distro and you’ve got a monster server compared to what was available 10 years ago. This type of machine is cheap, too, compared to a server from 10 years ago. That means you can buy a whole lot more of them.

What do we do with all this computer power? How do we harness it?

With enabling software like Terracotta, clustering becomes easy. You’ve still got to design your software to take advantage of parallelism, but the act of running programs in parallel is no longer difficult. This is what Fred Brooks means when he talks about the essential versus accidental complexity.

Distributing code and running massively parallel programs used to be difficult. It required complex architectures and expensive application servers. This is accidental complexity. Advances in software development — like Terracotta, GridGain, Spring, and other FOSS programs — dramatically reduce if not eliminate the accidental complexity of distributing your programs to a cluster of machines. The essential complexity is writing your program and designing it for parallelism from the start. What your software does will always be the hard part of writing software, which is why there really isn’t a Silver Bullet. Enabling technologies like Terracotta, however, makes it easier to move bits around. We’ll see more uses of “the grid” in 2008.

SUMMARY

Grids have been coming for a long time, and lots of work have been put into them such that it’s constantly getter easier to write for and deploy to a grid. Watch how Open Terracotta enables architects to design a grid in ways that we haven’t even imagined yet.

I think 2008 is going to be a very good year for architects and developers.

Switch to our mobile site