Archive for category Engineering

Braintree’s customer support is 1st class

Braintree — an online payments processor — aims to have a developer-friendly API for integration and touts excellent customer service as one of their strengths.

I have been on the phone with these guys several times this week as I integrate our subscriptions at GreenWizard more tightly with Braintree for better reporting. Every interaction with the support staff at Braintree has been quick, pleasant and helpful, and each episode left me happy to have chosen Braintree as our payment gateway.

BT guys, if you’re reading this, feel free to use this as a happy customer testimonial. Keep up the good work. Your sandbox is also a pleasure to work with (unlike a major gateway who’s name rhymes with paypal).

“I just want to get started”

“I just want to get started.”   How commonly uttered is this phrase in the life of a new project?

My wife is tiling our bathroom and she’s carefully laying out tiles, measuring, preparing to make cuts that accommodate the cabinets, and various other tasks that are required before anyone can lay a single tile on grout and have it stay there forever. She lamented to me, “I just want to get started.”

“You’ve started!” I replied. “You’re doing this project. This is what it takes.”

“Yeah, I know, but I just want to get to it already.”

It’s the same thing with any software project. Coders are always saying they just want to get started. Writing code is fun. Figuring out what code to write or how to write it … not so much fun.

Visual progress on a bathroom with newly laid tiles is fun. Planning, measuring, cutting, and laying it out before grouting … not so much fun.

It’s all part of the project. In fact, it’s the essential part. Fred Brooks, Steve McConnell, and other software luminaries have already demonstrated that actual construction (coders writing code) represents a minority of time for a well-run software project. Requirements, design, and testing the fidelity of the implementation is the bulk of software development costs.

But we all just want to get started.

Declassified imagery graphically shows global warming in action

President Barack Obama declassified satellite imagery that graphically shows the effect of global warming. The imagery was previously kept classified by the Bush administration.

Read more:

http://www.guardian.co.uk/environment/2009/jul/26/climate-change-obama-administration

View the images:

http://gfl.usgs.gov/Publications.shtml

The Truth About Code Generation

Code generation done right can be a very effective and highly useful tool in your toolbox.  Done wrong it could be a maintenance nightmare.  This article reflects on different types of code generation, when to use each of them, and explains some pitfalls to avoid.

WHAT CODE GENERATION ISN’T:  A SILVER BULLET

Before we explore what code generation is and how to use it effectively, we must first understand what it isn’t:  A silver bullet.

No amount of code generation will save a doomed project.  If you’ve got inadequate staff , bad requirements (or no requirements), poor project sponsorship, or any number of the classic mistakes, code generation will not help you.  You’ve got bigger problems.

Moreover, you shouldn’t expect miracle productivity gains by using a code generator.  Fred Brooks and Steve McConnell (in The Mythical Man Month and Rapid Development, respectively) argue persuasively that actual coding and construction of software is or should be a minority part of the schedule.  Even if coding accounts for 50% of the schedule (which is doesn’t) and you can effectively generate half of the project’s code (which you can’t), the best you can hope to achieve is a 25% reduction in effort.

In reality, boilerplate code (the kind that is best generated) has been on a long, gradual decline thanks to advances in technology and better abstractions.  We’re left more and more to focus on the differences in our software (the essence) and less with the mundane minutiae of simple coding tasks (the accidental).

This is what Fred Brooks argues in No Silver Bullet.  There is no single tool that can produce an order of magnitude gain in productivity or quality because the accidental complexity of software (the act of constructing software itself) gets continuously easier, leaving you to focus on the truly hard problem (the essence):  What does your software do, how can it do it, and how do we test it sufficiently to know that it does it?

No silver bullet, indeed.

WHAT CODE GENERATION IS

A code generator is a tool that takes metadata as its input, merges the metadata with a template engine, and produces a series of source code files for its output.  The tool can be simple or elaborate, and you can generate any kind of code that you want.  You simply need to write the control program and templates for whatever you want to generate.

Code generation done well can save you some time in the long run (you have to invest effort in creating your generator) and increase quality because you know all generated code will be identical.  Any bugs you find in the code will be corrected once in the template.

One argument against code generation is that a data-driven subroutine can produce the same result as code generation.  I agree with this argument because the generator is a data-driven program.  Runtime reflection and good abstractions can produce the same results as code generation. I would argue, though, that this code is more complicated than the code created by the generator.  The generator might be as complex as the data-driven subroutine, but the code that is produced by the generator should be simple by design.  It would be trivially easy to attach a debugger and step over the generated code to find a bug.  I like debuggability.

Active vs. Passive

Generators come in two flavors:  Active and Passive.  Both are useful, but you must plan and design your project accordingly.

An active code generator maintains the code for the life of the project. Many active generators are invoked during the build process.  XDoclet is a good example of an active code generator.  I’ve used XDoclet to generate my webapp’s struts-config.xml file, and the generator was invoked by Ant during the build.  Another popular use of XDoclet is generating the boilerplate code and configurations for Enterprise Java Beans (EJBs).

Code generated by an active generator may or may not be checked into source control.  When invoked during a build and as part of the final artifact, generated code probably would not be in source control.  On the other hand, the output from an active code generator can be checked into source control and you could remove that step from the build process.  This isn’t to say the code is then maintained by hand!  On the contrary, the generator can be invoked frequently during a project.  The purpose of the active generator is to maintain the generated code.

A passive code generator creates code that you expect to maintain by hand afterwards.  Consider a wizard that asks you some questions before creating your basic class for you.  Likewise, many IDEs have useful generation snippet such as generating all your getters/setters from your class’ instance variables.  Both of these examples are simple yet extremely useful.  I would be continually frustrated if I had to write all my getters/setters by hand.

Passive code generators needn’t stop at simple IDE-level functionality.  Maven archetypes, for example, can create an entire project setup for you.  They create all your directories and starting pom.xml.  Depending on the archetype, this could be quite complex.

Similarly, you can create entire skeletal projects with functionality from a passive code generator.  One good example would be AppFuse, which creates your project structure, layout, build scripts, and can optionally create some basic functionality like user authentication.

IT’S JUST A TOOL

Always remember that code generation is a tool in your toolbox, nothing more.  More accurately, it’s a tool and die.

Every manufacturer has highly skilled workers creating dies, molds, and machine tools to create they parts they need.  Expert furniture makers don’t hand carve each and every table leg they require.  They make a jig and create exact copies of the table leg.  Each leg may be lovingly hand-checked for quality and assembled in the final table, but each leg certainly isn’t carved individually.

In the software world, there will be times when you need expert programmers writing templates and fewer junior engineers cranking out grunt code.  The experts make the tools and dies of our software world.

YOUR RESPONSIBILITY

If code generation is just a tool, then responsibility falls to the developer to understand when and how to use it.  It becomes the developer’s responsibility to create a design that does not require hand modification of any actively generated code. The design should be robust enough with plenty of hooks to allow for modification when needed.

One possible solution is to use active generation for base classes while using subclasses throughout the code.  The subclass could contain all the application-specific code needed, override base functionality as required, and leave the developer with a domain that could be easily regenerated while preserving all hand-written code.  Another design consideration is to model your application into a framework somewhat like Spring. Spring makes extensive use of the Template Method pattern and provides plenty of documented hooks for you to override when needed.

CONCLUSION

Code generation done well can increase quality and decrease costs in a project.  Time savings are compounded, too, when you find yourself implementing similar code across projects.  Each successive new project can benefit from the templates made in the last project.

Consistency across all generated code yields an easier learning curve because developers learn one standard way for basic functionality, leaving them to focus on the custom pieces of an application. Put another way, place as much functionality into the “accidental” realm as you can so that your developers can focus on the “essence.”  Generated code is easily understood and allows for better debuggability than runtime abstractions that produce the same effect.

There are very specific design considerations to be mindful of, particularly the need for a design to be robust enough to ensure hand-modification of actively generated code is not required.

Combine good active code generation with a library of common components and you will find yourself covering a large percentage of an application’s accidental complexity, leaving you more time to focus on the essence.

Code generation is a good tool for your toolbox.  An expert developer will understand when and how to use it effectively.

Be mindful of Collection.contains(obj)

Summary

All Collection.contains(obj) methods are not the same!

This article is a real world case study of the Big O differences between various implementations of Java’s Collection interface.   I found and fixed a grievous O(n^2) algorithm by using the right data structure.

Background

I was asked to investigate why some pages in our web application would save session data very quickly while another problem page would take literally tens of minutes. The application had at its core a Stateful Session Bean that held dirty objects which would be persisted to the database in a single transaction. Sure, the easy pages didn’t contain very much data to persist and we knew the problem page contains many times more data, but certainly not that much more data to cause 20 minute request times!

After I implemented the fix, the page elapsed time dropped from 20+ minutes to ~10 seconds. What did I do? I used the right data structure.

Data Structures and the Big O

The application used a Vector to store dirty objects. A Vector was used for two reasons: 1) the original engineers thought synchronization was important and 2) order was important for referential integrity. A Vector’s internal synchronization was unneeded because only a single user’s request thread ever access the application. The ordering, however, was extremely important because you couldn’t add a person’s data without first adding the person!

The problem page in the web app had to add thousands of rows of data to the database, hence there were thousands of dirty objects waiting in the cache for persistence. As the application created or dirtied objects, it checked its cache (the Vector) before adding it. You wouldn’t want the data to be persisted twice.

How did the app check its cache? vector.contains(obj);

The problem with vector.contains(obj) and list.contains(obj) is that they are O(n), which means they scale linearly. Put another way, it gets slower the more items you put into it. The page that created thousands of objects to persist got progressively slower with each object it created.

The solution was to switching to a LinkedHashSet which perserves order for referential integrity while providing O(1) performance for set.contains(obj) because all the objects are hashed.

The real problem was even worse, of course, because the app checked the cache each time before it added a new object.  This represents a good ol’ fashioned O(n^2) algorithm.

To be fair to the original developers, they wrote the application in Java 1.3 and LinkedHashSet was implemented in 1.4. Also, I don’t think they anticipated having a single page in the application generate thousands of objects.

Sample Code

Below is a simple program to highlight the performance differences between various Collection.contains(obj) methods

Elapsed times (in ms):

Vector: 3663
List: 3690
Set: 15
LinkedSet: 12

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package mgt.perf;
 
import java.util.*;
 
public class ContainsExample {
 
    private int collectionCount = 10000;
    private int testCount = 50000;
 
    public static void main(String[] args) {
        new ContainsExample().start();
    }
 
    private void start() {
 
        Collection vector = new Vector();
        Collection list = new ArrayList();
        Collection set = new HashSet();
        Collection linkedSet = new LinkedHashSet();
 
        populate(vector);
        populate(list);
        populate(set);
        populate(linkedSet);
 
        System.out.println("Elapsed times\n");
        System.out.println("    Vector:" + test(vector));
        System.out.println("      List:" + test(list));
        System.out.println("       Set:" + test(set));
        System.out.println(" LinkedSet:" + test(linkedSet));
    }
 
    private void populate(Collection set) {
        for (int i = 0; i < collectionCount; i++) {
            set.add(i);
        }
    }
 
    private long test(Collection collection) {
        Random rnd = new Random(System.currentTimeMillis());
        long started = System.currentTimeMillis();
        for (int i = 0; i < testCount; i++) {
            int lookFor = rnd.nextInt(collectionCount);
            if (!collection.contains(lookFor)) {
                throw new IllegalStateException(lookFor + " really should be in the collection");
            }
        }
        long elapsed = System.currentTimeMillis() - started;
        return elapsed;
    }
 
}

Frequently Forgotten Fundamental Facts about Software Engineering

I ran across this interesting article today:  Frequently Forgotten Fundamental Facts about Software Engineering.

I particularly like Requirements & Design bullet 2 (RD2) because we tend to gloss over “non-functional requirements” (e.g, performance, creating frameworks, etc.):

RD2. When a project moves from requirements to design, the solution process’s complexity causes an explosion of “derived requirements.” The list of requirements for the design phase is often 50 times longer than the list of original requirements.

Absent from the list is the Fred Brooks axiom: Adding people to a late project only makes it later.

Augmenting the Frequently Forgotten Fundamental Facts are Steve McConnell’s Classic Mistakes that prevent efficient software engineering. There is some overlap between the two.

I agree that many of these facts are frequently forgotten and that most organizations constantly make the classical mistakes.  How do I know?  A company I know tongue-in-cheekly named one of their conference rooms “Schedule Compression.”

As The Bard wrote, “Never a truer word than said in jest.”

Failure rates are cumulative

There used to be only two guarantees in life:  death and taxes.  In today’s complex IT environment, we can add a third:  Your production software systems will fail.  This is an absolute guarantee because the math is stacked against you.

Why?  Because failure rates are cumulative.  Simply put, five integrated systems each with 99.9% uptime yields an overall 99.5% uptime, unless you somehow figure out how to get them to fail at the same time.  Since you can’t do that, you will have a 5 failures for each 1000 runs of your program/service/job/etc.

That’s the math.  Better plan for failure.

The Zombie Horde vs. A Posse of Cowboys

A recent blog entry attempts to paint Big M Methodology as a zombie creating process and quotes Peopleware as the sole evidence of its argument.  You, the poor developer, are turned into a mindless zombie by having a defined process to follow.  You are given no license for creativity, no room for error, and you are discouraged from making mistakes.

This apparently makes you a zombie that must be told what to do and how to do it.  Or, to put it another way, this makes you a grown-up software developer that can write code for the space shuttle.

Fast Company has a fascinating article called “They Write the Right Stuff” that looks into the methodology that produces bug free software.  The software powering the space shuttle has to be bug free or people die.  Quality matters.  It was originally written in the internet stone age (1996), but it is just as relevant today as it was a decade ago.

[The shuttle group] is aggressively intolerant of ego-driven hotshots.In the shuttle group’s culture, there are no superstar programmers. The whole approach to developing software is intentionally designed not to rely on any particular person.

Mindless zombines cannot be superstars! Joel said that only superstars can hit the high notes!  How can bug free software be written by zombies?!  Don’t CMM Level 5 certified organizations (of which there are only a handful in the world) know they need superstars to send space ships into the wild blue yonder?

The blog entry makes the claim that disallowing developers to make mistakes is teamicide.  The blog author further claims that by stifling creativity, management and Big M Methodology shows distrust of their developers which dooms a project in the long run.

Again, someone forgot to tell the guys writing bug free code:

And the culture is equally intolerant of creativity, the individual coding flourishes and styles that are the signature of the all-night software world. “People ask, doesn’t this process stifle creativity? You have to do exactly what the manual says, and you’ve got someone looking over your shoulder,” says Keller. “The answer is, yes, the process does stifle creativity.”

And that is precisely the point — you can’t have people freelancing their way through software code that flies a spaceship, and then, with peoples lives depending on it, try to patch it once its in orbit. “Houston, we have a problem,” may make for a good movie; it’s no way to write software. “People have to channel their creativity into changing the process,” says Keller, “not changing the software.”

An interesting idea arises from Big M: You don’t fix bugs, you fix the process that allowed the bug in the first place.  The shuttle group “avoids blaming people for errors. The process assumes blame – and it’s the process that is analyzed to discover why and how an error got through.”

Capability and Maturity Model

CMM certification is an interesting thing, and I find the wording particularly enlightening:  “Maturity model.”  A CMM certified process is for grown-ups, not start ups.  It’s mature and rational, not for the cowboy coders who stay up all night slinging code from the hip in a heroic effort to ship version 1.0.

Tracking bugs, prioritizing issues, performing QA, and having basic version control and configuration management is the nuts and bolts of Level 2.  Many organizations have these basic project management processes in place and would qualify for level 2 certification. Level 3, though, is Big M Methodology and Process.  Without a defined process (level 3) that emits metrics (level 4), how can an organization possibly attempt to improve development, increase quality, and reduce costs via process improvement (level 5)?

When a process improvement demonstrably reduces the defect rate, the end user benefits with higher quality software at a reduced price.  This is absolutely required in the space shuttle, but isn’t it desired in everything else, from our operating system (no blue screens of death!) to our applications?  I don’t like kernel panics or having my computer crash from a bad driver.  I don’t like losing all my data because a bug shutdown my program.  A posse of cowboys can hack out a bad version 1 of their product, but it’s the Big M zombies lead by mature management that engineers the quality software I want to buy or manage our nuclear reactors.

It’s Just a Software Problem

The B-2 bomber wouldn’t fly on its maiden flight — but it was just a software problem. The new Denver airport was months late opening and millions of dollars over budget because its baggage handling system didn’t work right — but it was just a software problem. This spring, the European Space Agency’s new Ariane 5 rocket blew up on its maiden launch because of a little software problem. The federal government’s major agencies – from the IRS to the National Weather Service — are beset with projects that are years late and hundreds of millions of dollars over budget, often because of simple software problems.

blue-screen-of-death-airport.jpg

Talent does vary by developer — after all, we’re not resources and interchangeable cogs — but we need better processes for developing software.  We need process improvement to increase quality, which leaves more time for more features because we’re not consumed by rework issues.  We need to reduce the cost of software development, which reduces the price and increases demand.

We need developers to stop thinking all their creativity goes into the code because their creativity should be put into improving how we write code in the first place.

Best technical definition ever

“ORA-12505: TNS: listener does not currently know of SID given in connect descriptor

What does that error mean?  The site below defines the error eloquently:

ora-12505.png

Money as Debt

Want to know why the credit crunch is going on? Watch this highly enlightening video (broken down into 5 individual movies to fit YouTube limits). The video clearly explains why a “run on the bank” is deadly to an individual bank, and why there would be a “credit crunch” when there’s a run on all the banks.

And that’s what’s going on today. We’ve got a run on all the banks. Money created as debt is only good when the debts are repaid (which isn’t happening with rising foreclosure rates), so the entire highly leveraged system is crashing down. And that’s why we need a $1 trillion bailout of the financial system.

The fractional reserve system (as practiced by all banks today in the world) is awesome, scary, brilliant, and fragile. Money created as debt is fascinating and frightening. No debt == no money. That’s what happened during the Depression. Banks stopped or couldn’t lend

“That is what our money system is. If there were no debts in our money system, there wouldn’t be any money.” ~ Marriner Eccles, Chairman and Governor of the Federal Reserve Board

Enjoy or shudder!

Switch to our mobile site