Code generation done right can be a very effective and highly useful tool in your toolbox.  Done wrong it could be a maintenance nightmare.  This article reflects on different types of code generation, when to use each of them, and explains some pitfalls to avoid.

WHAT CODE GENERATION ISN’T:  A SILVER BULLET

Before we explore what code generation is and how to use it effectively, we must first understand what it isn’t:  A silver bullet.

No amount of code generation will save a doomed project.  If you’ve got inadequate staff , bad requirements (or no requirements), poor project sponsorship, or any number of the classic mistakes, code generation will not help you.  You’ve got bigger problems.

Moreover, you shouldn’t expect miracle productivity gains by using a code generator.  Fred Brooks and Steve McConnell (in The Mythical Man Month and Rapid Development, respectively) argue persuasively that actual coding and construction of software is or should be a minority part of the schedule.  Even if coding accounts for 50% of the schedule (which is doesn’t) and you can effectively generate half of the project’s code (which you can’t), the best you can hope to achieve is a 25% reduction in effort.

In reality, boilerplate code (the kind that is best generated) has been on a long, gradual decline thanks to advances in technology and better abstractions.  We’re left more and more to focus on the differences in our software (the essence) and less with the mundane minutiae of simple coding tasks (the accidental).

This is what Fred Brooks argues in No Silver Bullet.  There is no single tool that can produce an order of magnitude gain in productivity or quality because the accidental complexity of software (the act of constructing software itself) gets continuously easier, leaving you to focus on the truly hard problem (the essence):  What does your software do, how can it do it, and how do we test it sufficiently to know that it does it?

No silver bullet, indeed.

WHAT CODE GENERATION IS

A code generator is a tool that takes metadata as its input, merges the metadata with a template engine, and produces a series of source code files for its output.  The tool can be simple or elaborate, and you can generate any kind of code that you want.  You simply need to write the control program and templates for whatever you want to generate.

Code generation done well can save you some time in the long run (you have to invest effort in creating your generator) and increase quality because you know all generated code will be identical.  Any bugs you find in the code will be corrected once in the template.

One argument against code generation is that a data-driven subroutine can produce the same result as code generation.  I agree with this argument because the generator is a data-driven program.  Runtime reflection and good abstractions can produce the same results as code generation. I would argue, though, that this code is more complicated than the code created by the generator.  The generator might be as complex as the data-driven subroutine, but the code that is produced by the generator should be simple by design.  It would be trivially easy to attach a debugger and step over the generated code to find a bug.  I like debuggability.

Active vs. Passive

Generators come in two flavors:  Active and Passive.  Both are useful, but you must plan and design your project accordingly.

An active code generator maintains the code for the life of the project. Many active generators are invoked during the build process.  XDoclet is a good example of an active code generator.  I’ve used XDoclet to generate my webapp’s struts-config.xml file, and the generator was invoked by Ant during the build.  Another popular use of XDoclet is generating the boilerplate code and configurations for Enterprise Java Beans (EJBs).

Code generated by an active generator may or may not be checked into source control.  When invoked during a build and as part of the final artifact, generated code probably would not be in source control.  On the other hand, the output from an active code generator can be checked into source control and you could remove that step from the build process.  This isn’t to say the code is then maintained by hand!  On the contrary, the generator can be invoked frequently during a project.  The purpose of the active generator is to maintain the generated code.

A passive code generator creates code that you expect to maintain by hand afterwards.  Consider a wizard that asks you some questions before creating your basic class for you.  Likewise, many IDEs have useful generation snippet such as generating all your getters/setters from your class’ instance variables.  Both of these examples are simple yet extremely useful.  I would be continually frustrated if I had to write all my getters/setters by hand.

Passive code generators needn’t stop at simple IDE-level functionality.  Maven archetypes, for example, can create an entire project setup for you.  They create all your directories and starting pom.xml.  Depending on the archetype, this could be quite complex.

Similarly, you can create entire skeletal projects with functionality from a passive code generator.  One good example would be AppFuse, which creates your project structure, layout, build scripts, and can optionally create some basic functionality like user authentication.

IT’S JUST A TOOL

Always remember that code generation is a tool in your toolbox, nothing more.  More accurately, it’s a tool and die.

Every manufacturer has highly skilled workers creating dies, molds, and machine tools to create they parts they need.  Expert furniture makers don’t hand carve each and every table leg they require.  They make a jig and create exact copies of the table leg.  Each leg may be lovingly hand-checked for quality and assembled in the final table, but each leg certainly isn’t carved individually.

In the software world, there will be times when you need expert programmers writing templates and fewer junior engineers cranking out grunt code.  The experts make the tools and dies of our software world.

YOUR RESPONSIBILITY

If code generation is just a tool, then responsibility falls to the developer to understand when and how to use it.  It becomes the developer’s responsibility to create a design that does not require hand modification of any actively generated code. The design should be robust enough with plenty of hooks to allow for modification when needed.

One possible solution is to use active generation for base classes while using subclasses throughout the code.  The subclass could contain all the application-specific code needed, override base functionality as required, and leave the developer with a domain that could be easily regenerated while preserving all hand-written code.  Another design consideration is to model your application into a framework somewhat like Spring. Spring makes extensive use of the Template Method pattern and provides plenty of documented hooks for you to override when needed.

CONCLUSION

Code generation done well can increase quality and decrease costs in a project.  Time savings are compounded, too, when you find yourself implementing similar code across projects.  Each successive new project can benefit from the templates made in the last project.

Consistency across all generated code yields an easier learning curve because developers learn one standard way for basic functionality, leaving them to focus on the custom pieces of an application. Put another way, place as much functionality into the “accidental” realm as you can so that your developers can focus on the “essence.”  Generated code is easily understood and allows for better debuggability than runtime abstractions that produce the same effect.

There are very specific design considerations to be mindful of, particularly the need for a design to be robust enough to ensure hand-modification of actively generated code is not required.

Combine good active code generation with a library of common components and you will find yourself covering a large percentage of an application’s accidental complexity, leaving you more time to focus on the essence.

Code generation is a good tool for your toolbox.  An expert developer will understand when and how to use it effectively.