Archive for category HOW TO

Red Pen and Comments in the Margin for your UI

Do you remember how your High School English teacher graded your papers?  Mine all used red pen and circled things and wrote comments in the margin.  I still do this today when I write something important.  I print a copy, grab a red pen, and turn into a ruthless editor on the lookout for just a few simple things, like using fewer words.

But red pen and comments in the margin doesn’t only apply to writing.  You can apply the same principles and concepts to a user interface.  Take a screenshot of your application, print it out, and find yourself a red Bic.  There’s something special about writing on and drawing over an image of your application because it’s something you probably never do.  Editing a screenshot in Photoshop is good, too, though sometimes not as satisfying.

Working example

Below is a before and after screenshot of a single sidebar in FoodHub Pro that shows my Red Pen and Comments in the Margin thinking.  My software is constantly evolving and improving, and there’s no need to let ego get in the way of a great UI.  It might be that this sidebar could be refined further if we find that users don’t actually need all of the widgets.

The “After” image has fewer words, fewer lines, no distracting headers, and is shorter overall with no loss of functionality.

Use a Red Pen and Comments in the Margin and let me know how your UI turns out.  I’d love to see more before/after examples.

sidebar-side-by-side

FoodHub Pro sidebar with "red pen and comments in the margin" editing

#1:  Use fewer words

How many times have you heard someone explaining something and they said “What that means is …”  These words don’t say anything.  Just say what it means, don’t preempt.

What that means is be thrifty with words.”

The first four words mean nothing and the last four mean everything.  Take your red pen and cross through half of that sentence.

What words can I cut from my sidebar?  ”New Purchase Order” and “Choose a grower …” can be concisely written as “New PO for grower …”  That’s a 33% word discount!

#2:  Avoid clutter

Don’t make me think, don’t make me read, and especially don’t hide the few words I must read beneath words I want to ignore.

The headers in my sidebar have no purpose other than to be bigger and bolder than the words that have meaning, like the “From” and “to” labels and “Show only” which implies filtering capability.

#3:  Remove stuff

Do we really need headers?  Headers are generally meant to delineate blocks of stuff, which I think it particularly well suited by wells.  Headers are big, bold words that aren’t important.  They are too tall.  Wells, on the other hand, provide excellent delineation of things and implies depth on a page.   With fewer meaningful words remaining in a well, the purpose of the widget becomes obvious and you don’t need a header to explain what it is.

Are there any form fields that aren’t really needed?  Our customers don’t change business models very often (read: never), which means the type of Purchase Order created never changed.  We tucked that option away on a config screen and dropped it from our form.  It simply wasn’t needed.

The Results Have Been Measured

Without any loss of functionality and with an immeasurable increase in clarity and simplicity, we can measure results by counting what remains.

  1. 30% of words removed
  2. 100% of tall and bold headers removed
  3. 100% of bold form field headers removed
  4. 0% change in distinct widgets with wells replacing horizontal lines
  5. 0% loss of functionality
  6. Increased clarity and simplicity:  immeasurable

Red Pen and Comments in the Margin works as well for interface design as it did for my old English teacher.  It cuts through clutter and simplifies by eliminating anything that doesn’t directly and succinctly address the task at hand.  Everything in the design is subject for removal, from redundant or unnecessary words to unused elements to the use of white space.  Each item in a design should fight for its life because a good editor always has a red pen handy.

Use a classpath resource or kill your application’s portability

Here is the secret way to kill your application’s portability — and by portable, I mean across different computers, let alone operating systems:  Hardcode all your paths.

That’s it.  That very quickly kills portability.  It’s easy to accomplish, too.  Simply refer to all your configuration files, for example, by fully qualified pathname, like this:

1
2
System.setProperty("com.yourcorp.refdata.config.filename",
    "C:\\Documents and Settings\\FOO\\Perforce_FOO\\PATHS_CHANGED_FOR_ANONYMITY\\RefDataConfig.xml");

The above snippet is something I’m battling with to get unit tests working in my project. Naturally it doesn’t work for me because “FOO” isn’t my username nor is my Perforce sandbox “Performance_FOO” because, again, “FOO” isn’t my username.

This unit test won’t work across machines using the same OS, and our brethren using Macs or Linux boxes are completely hosed.

Don’t hardcode any paths in your application!

In Java, use a classpath resource.   This gives you portability.  It also allows a Configuration Management team the ability to package all required resources into a single artifact for better version control.

The safest way to get a classpath resource would be to use your current classloader to find the resource.

1
2
3
4
5
6
// Well-behaved Java programs set the thread's current classloader when running in a
// multi-classloader environment.  You see this when you write containers of any type.
Thread.currentThread().getContextClassLoader().getResource("/some/path/RefDataConfig.xml");
 
// or another way... sufficient for most cases
this.getClass().getClassLoader().getResource("/some/path/RefDataConfig.xml");

HOWTO: Sort a Python Dictionary/Map

I use Python all the time for quick little scripting tasks.  There’s nothing better to slice and dice a file, so I use Python for a lot of reporting tasks.  That usually involves building some kind of data structure in my script that I’m slicing and dicing from files.

In my work, I have a LOT of units of work processing in parallel on a grid.  I have GUIDs tagging each unit of work, and that GUID is the perfect key for a Map/Dictionary data structure.  There are times, though, that I want to get the values of the Map and sort by some value in the data itself.  The is important if I want to sort my results by elapsed time or some other interesting metric.

Here’s how you sort a Python Dictionary by some arbitrary value within the data structure:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import time
 
work = {}
 
#
# create some sample data...
#
for i in range(10):
    key = "unit_%s" % i
    unitOfWork = {
        "id" : key,
        "data" : {
            "name" : "Turansky",
            "dob" : "03/28",
            "favoriteNumber" : int(time.time()) + i
        }
    }
    work[key] = unitOfWork
 
print "The 'work' dictionary will print the objects randomly..."
for i in work:
    print work[i]
 
print ""
print "Sprinkle some sorting magic..."
 
# but you want to sort the objects by favoriteNumber'
# get your values as a list... you want to use the list.sort() method
units = work.values()
 
# provide a lambda function that references your data structure
units.sort(key = lambda obj:obj["data"]["favoriteNumber"])
 
print ""
print "... and just like that, you have order."
for u in units:
    print u

Here is the output:

The 'work' dictionary will print the objects randomly...
{'data': {'dob': '03/28', 'favoriteNumber': 1242069926, 'name': 'Turansky'}, 'id': 'unit_5'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069925, 'name': 'Turansky'}, 'id': 'unit_4'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069928, 'name': 'Turansky'}, 'id': 'unit_7'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069927, 'name': 'Turansky'}, 'id': 'unit_6'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069922, 'name': 'Turansky'}, 'id': 'unit_1'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069921, 'name': 'Turansky'}, 'id': 'unit_0'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069924, 'name': 'Turansky'}, 'id': 'unit_3'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069923, 'name': 'Turansky'}, 'id': 'unit_2'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069930, 'name': 'Turansky'}, 'id': 'unit_9'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069929, 'name': 'Turansky'}, 'id': 'unit_8'}

Sprinkle some sorting magic...

... and just like that, you have order.
{'data': {'dob': '03/28', 'favoriteNumber': 1242069921, 'name': 'Turansky'}, 'id': 'unit_0'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069922, 'name': 'Turansky'}, 'id': 'unit_1'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069923, 'name': 'Turansky'}, 'id': 'unit_2'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069924, 'name': 'Turansky'}, 'id': 'unit_3'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069925, 'name': 'Turansky'}, 'id': 'unit_4'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069926, 'name': 'Turansky'}, 'id': 'unit_5'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069927, 'name': 'Turansky'}, 'id': 'unit_6'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069928, 'name': 'Turansky'}, 'id': 'unit_7'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069929, 'name': 'Turansky'}, 'id': 'unit_8'}
{'data': {'dob': '03/28', 'favoriteNumber': 1242069930, 'name': 'Turansky'}, 'id': 'unit_9'}

Be mindful of Collection.contains(obj)

Summary

All Collection.contains(obj) methods are not the same!

This article is a real world case study of the Big O differences between various implementations of Java’s Collection interface.   I found and fixed a grievous O(n^2) algorithm by using the right data structure.

Background

I was asked to investigate why some pages in our web application would save session data very quickly while another problem page would take literally tens of minutes. The application had at its core a Stateful Session Bean that held dirty objects which would be persisted to the database in a single transaction. Sure, the easy pages didn’t contain very much data to persist and we knew the problem page contains many times more data, but certainly not that much more data to cause 20 minute request times!

After I implemented the fix, the page elapsed time dropped from 20+ minutes to ~10 seconds. What did I do? I used the right data structure.

Data Structures and the Big O

The application used a Vector to store dirty objects. A Vector was used for two reasons: 1) the original engineers thought synchronization was important and 2) order was important for referential integrity. A Vector’s internal synchronization was unneeded because only a single user’s request thread ever access the application. The ordering, however, was extremely important because you couldn’t add a person’s data without first adding the person!

The problem page in the web app had to add thousands of rows of data to the database, hence there were thousands of dirty objects waiting in the cache for persistence. As the application created or dirtied objects, it checked its cache (the Vector) before adding it. You wouldn’t want the data to be persisted twice.

How did the app check its cache? vector.contains(obj);

The problem with vector.contains(obj) and list.contains(obj) is that they are O(n), which means they scale linearly. Put another way, it gets slower the more items you put into it. The page that created thousands of objects to persist got progressively slower with each object it created.

The solution was to switching to a LinkedHashSet which perserves order for referential integrity while providing O(1) performance for set.contains(obj) because all the objects are hashed.

The real problem was even worse, of course, because the app checked the cache each time before it added a new object.  This represents a good ol’ fashioned O(n^2) algorithm.

To be fair to the original developers, they wrote the application in Java 1.3 and LinkedHashSet was implemented in 1.4. Also, I don’t think they anticipated having a single page in the application generate thousands of objects.

Sample Code

Below is a simple program to highlight the performance differences between various Collection.contains(obj) methods

Elapsed times (in ms):

Vector: 3663
List: 3690
Set: 15
LinkedSet: 12

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package mgt.perf;
 
import java.util.*;
 
public class ContainsExample {
 
    private int collectionCount = 10000;
    private int testCount = 50000;
 
    public static void main(String[] args) {
        new ContainsExample().start();
    }
 
    private void start() {
 
        Collection vector = new Vector();
        Collection list = new ArrayList();
        Collection set = new HashSet();
        Collection linkedSet = new LinkedHashSet();
 
        populate(vector);
        populate(list);
        populate(set);
        populate(linkedSet);
 
        System.out.println("Elapsed times\n");
        System.out.println("    Vector:" + test(vector));
        System.out.println("      List:" + test(list));
        System.out.println("       Set:" + test(set));
        System.out.println(" LinkedSet:" + test(linkedSet));
    }
 
    private void populate(Collection set) {
        for (int i = 0; i < collectionCount; i++) {
            set.add(i);
        }
    }
 
    private long test(Collection collection) {
        Random rnd = new Random(System.currentTimeMillis());
        long started = System.currentTimeMillis();
        for (int i = 0; i < testCount; i++) {
            int lookFor = rnd.nextInt(collectionCount);
            if (!collection.contains(lookFor)) {
                throw new IllegalStateException(lookFor + " really should be in the collection");
            }
        }
        long elapsed = System.currentTimeMillis() - started;
        return elapsed;
    }
 
}

HOW TO: Enable debug and JMX ports in your java app

Ever have a stuck or deadlocked thread in a production application? Use JMX to inspect what’s going on inside your JVM, which includes thread views.  It’ll show you which threads are running, waiting, or blocked and where in the stacktrace they currently are.  I’ve used this information to find blocked threads in strange places.  JMX also shows you the memory usage of your java process, including memory consumed by classloaders in permspace.

The debug options will open your debug ports, naturally, and let you connect your debugger.

All you have to do is run your java process with these startup options:
DEBUG
-Xdebug -Xrunjdwp:transport=dt_socket,address=$DEBUG_PORT,server=y,suspend=n
JMX
-Dcom.sun.management.jmxremote.port=$JMX_PORT -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
Look in your $JAVA_HOME/bin and you’ll see a jconsole executable. That GUI will let you connect to the machine running your java process on the port specified.

I hope you find these tips useful.  Both have been extremely useful to me (as well as adding optional profiling vars to a JVM!).

Design Patterns Quick Reference Cards

Jason McDonald’s printable design pattern reference cards were printed by DZone as part of their “RefCardz” iniative.

You can find Jason’s cards here and you can find other useful design patterns stuff here.

HOW TO: Use JDBC Batching for 7-8X throughput gains

Using the batched statement capability of your JDBC driver can give you 7-8X throughput gains. Not only is batching significantly faster, it’ll save database CPU cycles and be easier on the network, too.

The graph below shows elapsed time (in milliseconds) by batch size. For each data point, 1K rows were inserted into a simple table in MySQL. The benchmarking code I used can be found here.

jdbc_batching_gains.png

Why is batching so much faster?

First, depending on how much PreparedStatement caching your driver is doing, your database may be spending a lot of time parsing and compiling statements. After the statement is parsed and compiled, bind variables are applied. In our example, the data base will parse and compile the statement once as opposed to 1,000 times. This reduces the work your database performs and saves CPU.

Second, all bind variables are passed to the database in a single network call instead of 1,000 separate out-of-process, across-the-network calls. This helps reduce network traffic.

Third, depending on the internal architecture of your code, single statements may return the connection to a pool after every use. Multiply that by 1,000 and run a profiler and you’ll see yourself calling take/put methods a lot. Many pools also verify the connection on check-in and check-out. “select 1 from dual” is a common check for a pool to use. Your 1,000 uses of a connection may also be incurring the cost of 2,000 “select 1 from dual” statements!

When should you use batching?

Batching is particularly useful in importing scenarios where you need to get lots of data into your application quickly, but it can be used even when executing a few similar statements. Check out the example source code provided to see if batching is right for you. Fiddle with the numbers to see the gains for batching just 10 similar statements. It may not be 8X big, but trumpeting 25% gains to management is still a win for you and your team.

Use JDBC Batching!

JDBC batching can give you dramatic throughput gains while simultaneously being less abusive to your hardware. Overall, if you have the opportunity to use batch inserts and updates, you should seize that opportunity. Look at your application’s internal architecture to see if batching is right for you.

“Don’t Make Me Think” applies to your code, too

, HOW TO | 6 Comments

Don’t make me think. That’s how I feel about your code.

Or as Martin Fowler puts it:

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” -Martin Fowler, Refactoring: Improving the Design of Existing Code

You’ve reached a whole new level of mastery when you write for simplicity, elegance, and maintainability. This is done on purpose, and it’s hard to get right. Deadlines, schedules, pressure, and stress all encourage us to cut corners and adopt a “Git ‘er done!” mentality. But Abandonment of planning under pressure is one of software’s classic mistakes. It’s a cardinal sin.

How do you write simple and maintainable code? I’ve got a 3-step program for you:

Step 1: Admit that simple isn’t easy

Designing simple software is hard. It has to be done on purpose. You can’t accidentally find yourself with well-written code and an elegant solution, it has to be written that way on purpose.

This admission is a bedrock principle required for designing great software and products. If you can’t admit that simple is Hard Workâ„¢, you haven’t hit rock bottom yet by having to maintain code that would make readers of The Daily WTF blush.

Step 2: Read “Don’t Make Me Think”

Steve Krug’s excellent book “Don’t Make Me Think” is about website usability, yet it changed how I look at my code.

Why? Because Steve applied the same principles in his book to his book! And if it works in those two mediums, I thought it just might work for me, too, in my medium (code).

“Don’t Make Me Think” is very easily absorbed because he’s feeding you information in a readily accessible way. He wrote it simply on purpose, and I’m certain it took many more hours to edit than it did to write. Simple is hard.

Step 3: Practice simple everyday

There are innumerable decisions you make everyday that affect your project for better or worse. You need to recognize these as the opportunities they are. Here are a few things you can do every day:

  • Code in plain English. Use an active voice (just like writing). What do you think this method does?
  • dao.findCustomerBy(order);

    Or what about this if statement?

    if(admin.hasPermission(Permissions.VIEWFILE)){
       // allow...
    }

    or better yet…

    if(admin.hasViewFilePermission()){
       // allow...
    }

    The pretty method on the Admin class looks like this:

    public boolean hasViewFilePermission(){
       return hasPermission(Permissions.VIEWFILE);
    }

  • Make Stuff Obvious. Quick, what does this line of code do?
  • Date dt = march(28, 1973);

    When I’m reading through unit tests, I’d much rather see the above statement to create a date than the equivalent Java:

    Calendar cal = Calendar.getInstance();
    cal.set(Calendar.MONTH, Calendar.MARCH);
    cal.set(Calendar.DATE, 28);
    cal.set(Calendar.YEAR, 1973);
    Date dt = cal.getTime();

    You can find those convenient date methods here: dates.java (it’s Free software). Use Java 5’s static imports to make the short date seen above.

  • Be Merciless. Be your own worst critic when reviewing your code. Always strive to improve what you’ve written. Just as great essays and novels (and books like “Don’t Make Me Think”) require several rounds of editing, so too does your code.
  • Never nest ternary statements. ’nuff said.
  • Write comments, but be brief and explain why your code does what it does, not how it does it. We already know how it does it, we’re looking at the code.
  • That’s it. Three steps to better code. Putting it into practice won’t be easy, but if you want to be a master of your craft you’ll embrace the challenge and write things simply on purpose. The people who follow you and maintain your code will appreciate it.

    HOW TO: Better JavaScript Templates

    JavaScript Templates (Jst) is a pure Javascript templating engine that runs in your browser using JSP-like syntax. If that doesn’t sound familiar, check out the live working example on this site and download the code. It’s Free Open Source Software.

    Better JavaScript Templates

    HOW TO: Download & sort pictures from your camera using Python

    I’ve got tens of thousands of photos. When I last checked, the size on disk was over 25gb. And why not? Film is free!

    How do I keep track of them all?

    First, there’s Picasa from Google. It’s awesome. It’s the iTunes of photos.

    Second, there’s a little Python script I called “DownloadPhotosFromCameraAndSort.py” (It’s .txt on the server, rename to .py if you download it).

    If you couldn’t tell, I like meaningful names for my scripts. Likewise for my Java classes, components, projects, etc. Good code communicates it’s purpose clearly without too much human parsing.

    The script does the following:

    1. Find my camera
    2. Download all pictures/movies to 1 or more destinations
    3. Confirm the download before deleting the picture from the camera
    4. Sort the pictures into a dated subdirectory based on the individual picture’s last modified time

    The dated directories give me a running chronology of my life and the lives of my wife and daughter. It’s not a fancy archival system, but it’s simple and easy for me. It may work for you, too.

    The archive looks like this:


    photosdir.jpg

    Switch to our mobile site