Use a classpath resource or kill your application’s portability

Here is the secret way to kill your application’s portability — and by portable, I mean across different computers, let alone operating systems:  Hardcode all your paths.

That’s it.  That very quickly kills portability.  It’s easy to accomplish, too.  Simply refer to all your configuration files, for example, by fully qualified pathname, like this:

1
2
System.setProperty("com.yourcorp.refdata.config.filename",
    "C:\\Documents and Settings\\FOO\\Perforce_FOO\\PATHS_CHANGED_FOR_ANONYMITY\\RefDataConfig.xml");

The above snippet is something I’m battling with to get unit tests working in my project. Naturally it doesn’t work for me because “FOO” isn’t my username nor is my Perforce sandbox “Performance_FOO” because, again, “FOO” isn’t my username.

This unit test won’t work across machines using the same OS, and our brethren using Macs or Linux boxes are completely hosed.

Don’t hardcode any paths in your application!

In Java, use a classpath resource.   This gives you portability.  It also allows a Configuration Management team the ability to package all required resources into a single artifact for better version control.

The safest way to get a classpath resource would be to use your current classloader to find the resource.

1
2
3
4
5
6
// Well-behaved Java programs set the thread's current classloader when running in a
// multi-classloader environment.  You see this when you write containers of any type.
Thread.currentThread().getContextClassLoader().getResource("/some/path/RefDataConfig.xml");
 
// or another way... sufficient for most cases
this.getClass().getClassLoader().getResource("/some/path/RefDataConfig.xml");

Drivin’ like Gandhi

Did you know Americans drive 3 trillion miles annually?  We drove 250 billion miles just in April ‘09.  Pretty amazing.

The article Drive Like Gandhi shows how much we could save nearly 700 million barrels of oil and $34 billion by applying a few simple, conservative, and thrifty tips.

5 ways to help your new hire

It’s said that some of the most stressful things you can do in your life are move, have a kid, get married, and start a new job.  It’s all true, too, but this essay focuses on starting a new job because I’ve just started one.

All new employees are vulnerable, regardless of rank or position.  The newbie doesn’t know anyone, doesn’t know the culture, the business, or how to do the job they were hired for.  Yes, they have the skills and are experienced enough to do the job, but they lack all required institutional knowledge to start doing that job on the first day.  It’s a tough position to be in, especially considering the new hire is probably excited and enthusiastic, but rendered utterly impotent by lack of knowledge.

The best way to keep the enthusiasm alive and make that new hire productive is to get them integrated as quickly as possible.  Here are 5 simple things that will reduce downtime, reduce stress, and increase morale for the newbie.  This list is geared towards developers and techies, but some items apply generally.

1.  Make yourself available!

Nothing is worse than being shown your desk or office and then having your guide disappear, leaving you all alone.  Plan on spending time with your new hire or otherwise arranging their first few days to learn from the right people.  Yes, it takes time and everyone is busy with the current release, but abandoning your newbie increases their stress and lengthens the learning curve.

2.  Make sure their PC is ready to go

Twiddling thumbs is bad enough, but not having a PC online with email ready is even worse.  Make sure the new hire can connect to whatever resources they need to do their job.  Many companies achieve most of this by having ghost images of machines with most software pre-installed, but there are necessary network tasks as well.  Email setup?  Is the new hire in the right distribution groups?  All shared drives and other resources given the right permissions?

Make a checklist of all the tasks required to get the new hire into the network and domain.

3.  Hello, World!

The canonical “Hello, World” program proves a lot of things for such a simple application.  It proves that your environment is setup correctly, that you can checkout, build, deploy, and run your code.  It provides a working foundation to build upon and learn within.

What is the “Hello, world” equivalent for real world projects?  A working build from a clean checkout where all unit tests can run, preferably within the IDE, with minimal setup and configuration.

Your new developer needs a checklist of software to install and a simple guide to building and running the project’s unit tests.  I think a checklist is better than a preconfigured environment (from, say, an OS image with everthing preinstalled) because it gives the developer a thorough grounding in the technologies used for the project.  Let them install the build tools themselves and set the appropriate environment variables.  Let them install the source control software and checkout the project.  I believe this gives the new developer a sense of ownership over their PC and deeper project knowledge by knowing how to get it running from the ground up.

It’s true that the new developer will not be truly productive until they gain more intimate knowledge of the code and project, but by having the project running quickly on their local PC, the amount of downtime is lessened and the new developer feels less stress.

 4.  Define your SDLC

How does your new developer get new issues to work and resolve?  What is the process for testing and check-in?  Who are the people responsible for helping the developer get code through the process?

This is basic Software Development Life Cycle stuff and the foundation of the Capability Maturity Model (CMM).  It also helps the new developer feel a whole lot less lost when entering a new environment.

5.   Pair ‘em up!

There’s strength in numbers and comfort in a crowd.  The new hire doesn’t know anyone, so pairing him up with another new hire encourages bonding and forges immediate workplace friendships.  It also helps them both learn more quickly because they are both asking questions and going through it together.  They’ll remember different tidbits when overloaded with too much information in the first couple of days.

If there is only one new hire, have a more tenured employee work with them the first several days.  It’ll slow down the developer who’s been there a while, but it will speed up the new guy.

CONCLUSION

You know your new guy is stressed out and generally uncomfortable.  Making the assimilation process quick and easy is the humane thing to do, but it also makes a lot of business sense.  You are paying that new developer a lot of money.  You should want them to be productive as quickly as possible, as opposed to soaking up company resources.  Make them feel at ease and decrease the learning curve by getting them immersed quickly into the new environment.  It only requires a little bit of planning to keep them busy for the first several days and some basic documentation to get them up and running with a working project.

The above list is certainly not complete, it’s comprised of the first bunch of things that I thought would make my own transition easier. I’m sure a lot of new developers feel as I do when starting a new gig.  Please feel free to leave other helpful tips in the comments.

5 reasons we should all grow Victory Gardens

Victory Gardens were a popular and patriotic way to aid the war effort during World War II.  Millions of families across the country planted gardens to alleviate pressure on the domestic food supply during the war.  Victory Gardens also boosted public morale because everyone felt civic pride through their contributions.  In today’s difficult times, planting a garden makes more sense than ever.  In the spirit of public service, we should consider them Victory Gardens, just like the ones our grandfathers and grandmothers had.

Here are five good reasons we should all grow Victory Gardens. Mark Turansky's row garden

1.  SAVE MONEY!

Fresh vegetables from the grocery store can be expensive.  Growing your own vegetables is inexpensive!  Seeds are cheap.  Water is cheap.  Time and sunshine are free.

Enjoy a continuous harvest by staggering plantings of various veggies with different maturation rates.  You  are guaranteed that something will be available for consumption every day during the growing season.

2.  100% ORGANIC

Your home-grown, fresh vegetables are chemical-free.  Do you really want your children consuming pesticides and poisons designed to kill organisms?  Growing your own vegetables is 100% organic.

There is a trend afoot for organic farms and gardening that’s bigger than your backyard.  Organic farms are being built into developments and subdivisions as an amenity, giving the local community access to fresh, healthy, and chemical-free produce.

3.  REDUCE WASTE

According to the EPA, 24% of our landfill waste is comprised of lawn clippings, leaves, and organic scraps from the kitchen.  In other words, perfect compost materials account for a quarter of our garbage!  This is a waste of our tax payer money.  Fiscal conservatives and environmentalists alike can agree to save money, space, and resources by composting.

Making compost is easy and it’s great for your soil.  It makes your garden vibrant and healthy, and the legacy you leave long after you move from that house is revitalized and regenerated soil.  This is a Very Good Thing for our communities.

Compost4.  GOOD FOR THE ENVIRONMENT

Your fresh vegetables have a small or even negative carbon footprint. There is no truck carrying your produce across the country for consumption, so there is no pollution from your veggies.  And considering that all green plants convert carbon dioxide into oxygen, your garden is cleaning the air.

5.  IT’S FUN AND REWARDING

My daughter loves picking snowpeas with me when I get home from work.  She loves playing in the dirt and planting seeds.  It’s a great way to bond, but it’s also a valuable learning experience.  She is seeing the results of her work while learning when and how to plant various crops.  I know she’ll have great memories of working the garden with Daddy.

Sophie picking peasMore than bonding with my girls (the baby just likes playing in the dirt, but she’ll learn), gardening is also rewarding for me.  I enjoy watching it come alive and grow.  It’s a great reason to spend time outside enjoying the sunshine.  It’s fun to get dirty while getting some exercise. I also know I’m doing a good thing for my family, my community, and our environment.

I encourage everyone to grow a Victory Garden during this recession.  Let’s show future generations that we’ve learned something from The Greatest Generation.  We’ll all be better off, and so will our communities and environment.

Safety 1st made the worst plug protector ever

To any parents using this brand of outlet cover:  They suck.  Safety 1st apparently does not test their products with real babies, because my nine month old bested their plug protector with ease.  Turns out, she’s not gifted or special.  There is another video on YouTube of a baby crawling across the floor and easily pulling this plug from the outlet.

Do not buy plug protector / outlet covers from Safety 1st.  Needless to say, we are replacing ours. Now here is the entertaining video:

And here is the other baby I mentioned: http://www.youtube.com/watch?v=1OXe-uQZA-E&NR=1

Early Childhood Education

Sesame Street is 40 years old and struggling (ratings-wise) against Dora the Explorer and SpongeBob SquarePants.  What little bit of Spanish Dora teaches my little daughter is not the same as the impact Sesame Street has had on the world.

Newsweek has a retrospective new article that talks about the importance of Sesame Street.  I can corroborate the facts stated in the article:

“Before Sesame Street, kindergartens taught very little,” says [Joan Ganz Cooney, Sesame Street co-founder and TV producer], “and suddenly masses of children were coming in knowing letters and numbers.” Independent research found that children who regularly watch Sesame Street gained more than nonviewers on tests of letter and number recognition, vocabulary and early math skills.

My daughter isn’t 4 yet, but she’s reading her bedtime books to us now. She turns 4 next month, and for this past month she’s taken over all nighttime reading. I simply help with the hard words and encourage her to sound out the rest.

We’re doing math now, too.  We incorporate fun little games into daily activities that demonstrate addition and subtraction.  For example, we’ll ask her how many strawberries she’ll have left in her bowl if she eats 3 of them.  She gets it.  She understands addition and subtraction.  It’s time to start with multiplication and division.  Maybe I’ll show her how to separate her blocks into groups of 3 and ask her how many groups she has.  It doesn’t matter how I introduce the concepts, so long as it’s fun.

Maria Montessori was right in her approach to learning and her new pedagogical style, but researchers today find there is almost no age requirement to early education.  Maria Montessori originally developed her curriculum for young children aged 3-6, but there are now programs for younger children, too.

My daughter learned sign language as a baby.  The benefits are amazing.  Toddlers can communicate with us long before they can speak.  Knowing their needs are being heard gives them confidence and makes for an easier child.  My daughter once signed “cold” to me in a gas station parking lot during a road trip.  She was only old enough to say a couple of words (”dada”, “mama”, “dog”, and “duck” come to mind), but she knew dozens of signs and this was the first time she used “cold” on her own.  I was stoked! She very clearly communicated her need to me. She wanted to be back in the car!

I read that 18 month old toddlers can only speak 8-10 words but can know up to 75 signs.  We counted my daughter’s vocabulary and the math was spot on.  She knew 8 words and 65 signs, many of which were genuinely useful (others were just fun):  up/down, hot/cold, hungry, sleepy, more, milk, apple, diaper, dog, cat, and many more.

Kids are natural sponges.  They want to learn.  They just need the right environment and encouragement.

How to grow old and happy

I just read a very interesting article in The Atlantic about a seven decade study that followed 268 Harvard undergrads throughout their life with the single question: “What makes us happy?”  (The official study is called the “Harvard Study of Adult Development”).

You can read the full article here: http://www.theatlantic.com/doc/200906/happiness/1

The study found 7 major criteria for a happy life:

  • Employing mature adaptations *
  • Education
  • Stable marriage
  • Not smoking
  • Not abusing alcohol
  • Moderate exercise
  • Healthy weight

*Psychoanalytic metaphor of “adaptations,” or unconscious responses to pain, conflict, or uncertainty

I found this passage notable:

Of the 106 Harvard men who had five or six of these factors in their favor at age 50, half ended up at 80 as what [the author] called “happy-well” and only 7.5 percent as “sad-sick.” Meanwhile, of the men who had three or fewer of the health factors at age 50, none ended up “happy-well” at 80. Even if they had been in adequate physical shape at 50, the men who had three or fewer protective factors were three times as likely to be dead at 80 as those with four or more factors.

The purpose of the study was to determine who ages well and is happy and well adjusted.  Being unhappy may lead to drinking or drugs.  Drinking may cause a spouse to leave.  Depression can lead to more unhealthy living or unfulfilled aspirations.  On the other hand, having a good education may offer more opportunities in life to perform good works or be actively engaged.  Maintaining a healthy family life may boost self-esteem and cause people to stay healthy or productive.

It is easy to weave these factors together and understand how they interact and compound each other.

So says the author of the study after decades of research: “That the only thing that really matters in life are your relationships to other people.”

The Truth About Code Generation

Code generation done right can be a very effective and highly useful tool in your toolbox.  Done wrong it could be a maintenance nightmare.  This article reflects on different types of code generation, when to use each of them, and explains some pitfalls to avoid.

WHAT CODE GENERATION ISN’T:  A SILVER BULLET

Before we explore what code generation is and how to use it effectively, we must first understand what it isn’t:  A silver bullet.

No amount of code generation will save a doomed project.  If you’ve got inadequate staff , bad requirements (or no requirements), poor project sponsorship, or any number of the classic mistakes, code generation will not help you.  You’ve got bigger problems.

Moreover, you shouldn’t expect miracle productivity gains by using a code generator.  Fred Brooks and Steve McConnell (in The Mythical Man Month and Rapid Development, respectively) argue persuasively that actual coding and construction of software is or should be a minority part of the schedule.  Even if coding accounts for 50% of the schedule (which is doesn’t) and you can effectively generate half of the project’s code (which you can’t), the best you can hope to achieve is a 25% reduction in effort.

In reality, boilerplate code (the kind that is best generated) has been on a long, gradual decline thanks to advances in technology and better abstractions.  We’re left more and more to focus on the differences in our software (the essence) and less with the mundane minutiae of simple coding tasks (the accidental).

This is what Fred Brooks argues in No Silver Bullet.  There is no single tool that can produce an order of magnitude gain in productivity or quality because the accidental complexity of software (the act of constructing software itself) gets continuously easier, leaving you to focus on the truly hard problem (the essence):  What does your software do, how can it do it, and how do we test it sufficiently to know that it does it?

No silver bullet, indeed.

WHAT CODE GENERATION IS

A code generator is a tool that takes metadata as its input, merges the metadata with a template engine, and produces a series of source code files for its output.  The tool can be simple or elaborate, and you can generate any kind of code that you want.  You simply need to write the control program and templates for whatever you want to generate.

Code generation done well can save you some time in the long run (you have to invest effort in creating your generator) and increase quality because you know all generated code will be identical.  Any bugs you find in the code will be corrected once in the template.

One argument against code generation is that a data-driven subroutine can produce the same result as code generation.  I agree with this argument because the generator is a data-driven program.  Runtime reflection and good abstractions can produce the same results as code generation. I would argue, though, that this code is more complicated than the code created by the generator.  The generator might be as complex as the data-driven subroutine, but the code that is produced by the generator should be simple by design.  It would be trivially easy to attach a debugger and step over the generated code to find a bug.  I like debuggability.

Active vs. Passive

Generators come in two flavors:  Active and Passive.  Both are useful, but you must plan and design your project accordingly.

An active code generator maintains the code for the life of the project. Many active generators are invoked during the build process.  XDoclet is a good example of an active code generator.  I’ve used XDoclet to generate my webapp’s struts-config.xml file, and the generator was invoked by Ant during the build.  Another popular use of XDoclet is generating the boilerplate code and configurations for Enterprise Java Beans (EJBs).

Code generated by an active generator may or may not be checked into source control.  When invoked during a build and as part of the final artifact, generated code probably would not be in source control.  On the other hand, the output from an active code generator can be checked into source control and you could remove that step from the build process.  This isn’t to say the code is then maintained by hand!  On the contrary, the generator can be invoked frequently during a project.  The purpose of the active generator is to maintain the generated code.

A passive code generator creates code that you expect to maintain by hand afterwards.  Consider a wizard that asks you some questions before creating your basic class for you.  Likewise, many IDEs have useful generation snippet such as generating all your getters/setters from your class’ instance variables.  Both of these examples are simple yet extremely useful.  I would be continually frustrated if I had to write all my getters/setters by hand.

Passive code generators needn’t stop at simple IDE-level functionality.  Maven archetypes, for example, can create an entire project setup for you.  They create all your directories and starting pom.xml.  Depending on the archetype, this could be quite complex.

Similarly, you can create entire skeletal projects with functionality from a passive code generator.  One good example would be AppFuse, which creates your project structure, layout, build scripts, and can optionally create some basic functionality like user authentication.

IT’S JUST A TOOL

Always remember that code generation is a tool in your toolbox, nothing more.  More accurately, it’s a tool and die.

Every manufacturer has highly skilled workers creating dies, molds, and machine tools to create they parts they need.  Expert furniture makers don’t hand carve each and every table leg they require.  They make a jig and create exact copies of the table leg.  Each leg may be lovingly hand-checked for quality and assembled in the final table, but each leg certainly isn’t carved individually.

In the software world, there will be times when you need expert programmers writing templates and fewer junior engineers cranking out grunt code.  The experts make the tools and dies of our software world.

YOUR RESPONSIBILITY

If code generation is just a tool, then responsibility falls to the developer to understand when and how to use it.  It becomes the developer’s responsibility to create a design that does not require hand modification of any actively generated code. The design should be robust enough with plenty of hooks to allow for modification when needed.

One possible solution is to use active generation for base classes while using subclasses throughout the code.  The subclass could contain all the application-specific code needed, override base functionality as required, and leave the developer with a domain that could be easily regenerated while preserving all hand-written code.  Another design consideration is to model your application into a framework somewhat like Spring. Spring makes extensive use of the Template Method pattern and provides plenty of documented hooks for you to override when needed.

CONCLUSION

Code generation done well can increase quality and decrease costs in a project.  Time savings are compounded, too, when you find yourself implementing similar code across projects.  Each successive new project can benefit from the templates made in the last project.

Consistency across all generated code yields an easier learning curve because developers learn one standard way for basic functionality, leaving them to focus on the custom pieces of an application. Put another way, place as much functionality into the “accidental” realm as you can so that your developers can focus on the “essence.”  Generated code is easily understood and allows for better debuggability than runtime abstractions that produce the same effect.

There are very specific design considerations to be mindful of, particularly the need for a design to be robust enough to ensure hand-modification of actively generated code is not required.

Combine good active code generation with a library of common components and you will find yourself covering a large percentage of an application’s accidental complexity, leaving you more time to focus on the essence.

Code generation is a good tool for your toolbox.  An expert developer will understand when and how to use it effectively.

HOWTO: Sort a Python Dictionary/Map

I use Python all the time for quick little scripting tasks.  There’s nothing better to slice and dice a file, so I use Python for a lot of reporting tasks.  That usually involves building some kind of data structure in my script that I’m slicing and dicing from files.

In my work, I have a LOT of units of work processing in parallel on a grid.  I have GUIDs tagging each unit of work, and that GUID is the perfect key for a Map/Dictionary data structure.  There are times, though, that I want to get the values of the Map and sort by some value in the data itself.  The is important if I want to sort my results by elapsed time or some other interesting metric.

Here’s how you sort a Python Dictionary by some arbitrary value within the data structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import time
 
work = {}
 
#
# create some sample data...
#
for i in range(10):
    key = "unit_%s" % i
    unitOfWork = {
        "id" : key,
        "data" : {
            "name" : "Turansky",
            "dob" : "03/28",
            "favoriteNumber" : int(time.time()) + i
        }
    }
    work[key] = unitOfWork
 
print "The 'work' dictionary will print the objects randomly..."
for i in work:
    print work[i]
 
print ""
print "Sprinkle some sorting magic..."
 
# but you want to sort the objects by favoriteNumber'
# get your values as a list... you want to use the list.sort() method
units = work.values()
 
# provide a lambda function that references your data structure
units.sort(key = lambda obj:obj["data"]["favoriteNumber"])
 
print ""
print "... and just like that, you have order."
for u in units:
    print u

Here is the output:

The 'work' dictionary will print the objects randomly...
{'data': {'dob': '03/28', 'favoriteNumber': 1242069926, 'name': 'Turansky'}, 'id': 'unit_5'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069925, 'name': 'Turansky'}, 'id': 'unit_4'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069928, 'name': 'Turansky'}, 'id': 'unit_7'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069927, 'name': 'Turansky'}, 'id': 'unit_6'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069922, 'name': 'Turansky'}, 'id': 'unit_1'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069921, 'name': 'Turansky'}, 'id': 'unit_0'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069924, 'name': 'Turansky'}, 'id': 'unit_3'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069923, 'name': 'Turansky'}, 'id': 'unit_2'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069930, 'name': 'Turansky'}, 'id': 'unit_9'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069929, 'name': 'Turansky'}, 'id': 'unit_8'}

Sprinkle some sorting magic...

... and just like that, you have order.
{'data': {'dob': '03/28', 'favoriteNumber': 1242069921, 'name': 'Turansky'}, 'id': 'unit_0'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069922, 'name': 'Turansky'}, 'id': 'unit_1'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069923, 'name': 'Turansky'}, 'id': 'unit_2'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069924, 'name': 'Turansky'}, 'id': 'unit_3'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069925, 'name': 'Turansky'}, 'id': 'unit_4'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069926, 'name': 'Turansky'}, 'id': 'unit_5'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069927, 'name': 'Turansky'}, 'id': 'unit_6'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069928, 'name': 'Turansky'}, 'id': 'unit_7'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069929, 'name': 'Turansky'}, 'id': 'unit_8'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069930, 'name': 'Turansky'}, 'id': 'unit_9'}

Be mindful of Collection.contains(obj)

Summary

All Collection.contains(obj) methods are not the same!

This article is a real world case study of the Big O differences between various implementations of Java’s Collection interface.   I found and fixed a grievous O(n^2) algorithm by using the right data structure.

Background

I was asked to investigate why some pages in our web application would save session data very quickly while another problem page would take literally tens of minutes. The application had at its core a Stateful Session Bean that held dirty objects which would be persisted to the database in a single transaction. Sure, the easy pages didn’t contain very much data to persist and we knew the problem page contains many times more data, but certainly not that much more data to cause 20 minute request times!

After I implemented the fix, the page elapsed time dropped from 20+ minutes to ~10 seconds. What did I do? I used the right data structure.

Data Structures and the Big O

The application used a Vector to store dirty objects. A Vector was used for two reasons: 1) the original engineers thought synchronization was important and 2) order was important for referential integrity. A Vector’s internal synchronization was unneeded because only a single user’s request thread ever access the application. The ordering, however, was extremely important because you couldn’t add a person’s data without first adding the person!

The problem page in the web app had to add thousands of rows of data to the database, hence there were thousands of dirty objects waiting in the cache for persistence. As the application created or dirtied objects, it checked its cache (the Vector) before adding it. You wouldn’t want the data to be persisted twice.

How did the app check its cache? vector.contains(obj);

The problem with vector.contains(obj) and list.contains(obj) is that they are O(n), which means they scale linearly. Put another way, it gets slower the more items you put into it. The page that created thousands of objects to persist got progressively slower with each object it created.

The solution was to switching to a LinkedHashSet which perserves order for referential integrity while providing O(1) performance for set.contains(obj) because all the objects are hashed.

The real problem was even worse, of course, because the app checked the cache each time before it added a new object.  This represents a good ol’ fashioned O(n^2) algorithm.

To be fair to the original developers, they wrote the application in Java 1.3 and LinkedHashSet was implemented in 1.4. Also, I don’t think they anticipated having a single page in the application generate thousands of objects.

Sample Code

Below is a simple program to highlight the performance differences between various Collection.contains(obj) methods

Elapsed times (in ms):

Vector: 3663
List: 3690
Set: 15
LinkedSet: 12

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package mgt.perf;
 
import java.util.*;
 
public class ContainsExample {
 
    private int collectionCount = 10000;
    private int testCount = 50000;
 
    public static void main(String[] args) {
        new ContainsExample().start();
    }
 
    private void start() {
 
        Collection vector = new Vector();
        Collection list = new ArrayList();
        Collection set = new HashSet();
        Collection linkedSet = new LinkedHashSet();
 
        populate(vector);
        populate(list);
        populate(set);
        populate(linkedSet);
 
        System.out.println("Elapsed times\n");
        System.out.println("    Vector:" + test(vector));
        System.out.println("      List:" + test(list));
        System.out.println("       Set:" + test(set));
        System.out.println(" LinkedSet:" + test(linkedSet));
    }
 
    private void populate(Collection set) {
        for (int i = 0; i < collectionCount; i++) {
            set.add(i);
        }
    }
 
    private long test(Collection collection) {
        Random rnd = new Random(System.currentTimeMillis());
        long started = System.currentTimeMillis();
        for (int i = 0; i < testCount; i++) {
            int lookFor = rnd.nextInt(collectionCount);
            if (!collection.contains(lookFor)) {
                throw new IllegalStateException(lookFor + " really should be in the collection");
            }
        }
        long elapsed = System.currentTimeMillis() - started;
        return elapsed;
    }
 
}

Switch to our mobile site