Archive for category Code Hints

How to send asynchronous requests in PHP

It’s easy to make asynchronous calls in PHP with just a little bit of HTTP header knowledge and some library code around PHP sockets.

This technique is useful when posting requests to your own server for bits of logic you’d like to run in the background.  If you don’t control the endpoint, you might not be comfortable with some of the limitations, such as the inability to read anything from the response.  So, you couldn’t post data to a webservice and receive an HTTP 200 OK response and certainly not an ID for an object newly created by the service call. Any bad IP address would give you an error and you’d also get an error if your socket couldn’t connect. This level of error handling might be sufficient for what you need.

For best performance, use IP addresses instead of DNS entries to prevent the need for look-up and resolution.

For fire and forget stuff, this is the bees knees:

(Please ignore my class definition and Controller superclass. This is actual working code from my CodeIgniter application where I scratched this out as a proof of concept)

class Scratch extends SQ_Controller {

    function Scratch() {
        parent::SQ_Controller();
    }

    function index() {

        echo "PHP Async Test...
";

        $params = array(
            "one" => "111111",
            "two" => "22222",
            "three" => "33333",
            "four" => "44444",
        );
        $this->curl_post_async("http://127.0.0.1/sq/scratch/longone", $params);
    }

    function longone(){

        $one = $_POST["one"];
        $two = $_POST["two"];
        $three = $_POST["three"];
        $four = $_POST["four"];

        echo uniqid("You won't see this because your PHP script isn't waiting to read any response");

        // put some long delay in here, so you can see how quickly the async requests returns
        sleep(5);

        // and the proof that something actually happens...  write out the HTTP params that were sent over the wire
        $fp = fopen('/PATH/TO/YOUR/DIR/FOR/OUTPUT/data.txt', 'w');
        fwrite($fp, $one);
        fwrite($fp, $two);
        fwrite($fp, $three);
        fwrite($fp, $four);
        fclose($fp);

    }

    function curl_post_async($url, $params = array()){

        $post_params = array();

        foreach ($params as $key => &$val) {
              if (is_array($val)) $val = implode(',', $val);
                $post_params[] = $key.'='.urlencode($val);
            }
            $post_string = implode('&', $post_params);

            $parts=parse_url($url);

            $fp = fsockopen($parts['host'],
                isset($parts['port'])?$parts['port']:80,
                $errno, $errstr, 30);

            $out = "POST ".$parts['path']." HTTP/1.1\r\n";
            $out.= "Host: ".$parts['host']."\r\n";
            $out.= "Content-Type: application/x-www-form-urlencoded\r\n";
            $out.= "Content-Length: ".strlen($post_string)."\r\n";
            $out.= "Connection: Close\r\n\r\n";
            if (isset($post_string)) $out.= $post_string;

            fwrite($fp, $out);
            fclose($fp);
    }
}

Get the latest subversion revision number with Bash

Here’s  a quick one-liner you can use in Bash to get the latest subversion revision number from your present working directory:

sed -n '4p' .svn/entries

Use backticks to assign it to a local variable you can use when making a build (say, a zip archive of a website or other type of artifact you might want to create):

version=`sed -n '4p' .svn/entries`

Use a classpath resource or kill your application’s portability

Here is the secret way to kill your application’s portability — and by portable, I mean across different computers, let alone operating systems:  Hardcode all your paths.

That’s it.  That very quickly kills portability.  It’s easy to accomplish, too.  Simply refer to all your configuration files, for example, by fully qualified pathname, like this:

1
2
System.setProperty("com.yourcorp.refdata.config.filename",
    "C:\\Documents and Settings\\FOO\\Perforce_FOO\\PATHS_CHANGED_FOR_ANONYMITY\\RefDataConfig.xml");

The above snippet is something I’m battling with to get unit tests working in my project. Naturally it doesn’t work for me because “FOO” isn’t my username nor is my Perforce sandbox “Performance_FOO” because, again, “FOO” isn’t my username.

This unit test won’t work across machines using the same OS, and our brethren using Macs or Linux boxes are completely hosed.

Don’t hardcode any paths in your application!

In Java, use a classpath resource.   This gives you portability.  It also allows a Configuration Management team the ability to package all required resources into a single artifact for better version control.

The safest way to get a classpath resource would be to use your current classloader to find the resource.

1
2
3
4
5
6
// Well-behaved Java programs set the thread's current classloader when running in a
// multi-classloader environment.  You see this when you write containers of any type.
Thread.currentThread().getContextClassLoader().getResource("/some/path/RefDataConfig.xml");
 
// or another way... sufficient for most cases
this.getClass().getClassLoader().getResource("/some/path/RefDataConfig.xml");

The Truth About Code Generation

Code generation done right can be a very effective and highly useful tool in your toolbox.  Done wrong it could be a maintenance nightmare.  This article reflects on different types of code generation, when to use each of them, and explains some pitfalls to avoid.

WHAT CODE GENERATION ISN’T:  A SILVER BULLET

Before we explore what code generation is and how to use it effectively, we must first understand what it isn’t:  A silver bullet.

No amount of code generation will save a doomed project.  If you’ve got inadequate staff , bad requirements (or no requirements), poor project sponsorship, or any number of the classic mistakes, code generation will not help you.  You’ve got bigger problems.

Moreover, you shouldn’t expect miracle productivity gains by using a code generator.  Fred Brooks and Steve McConnell (in The Mythical Man Month and Rapid Development, respectively) argue persuasively that actual coding and construction of software is or should be a minority part of the schedule.  Even if coding accounts for 50% of the schedule (which is doesn’t) and you can effectively generate half of the project’s code (which you can’t), the best you can hope to achieve is a 25% reduction in effort.

In reality, boilerplate code (the kind that is best generated) has been on a long, gradual decline thanks to advances in technology and better abstractions.  We’re left more and more to focus on the differences in our software (the essence) and less with the mundane minutiae of simple coding tasks (the accidental).

This is what Fred Brooks argues in No Silver Bullet.  There is no single tool that can produce an order of magnitude gain in productivity or quality because the accidental complexity of software (the act of constructing software itself) gets continuously easier, leaving you to focus on the truly hard problem (the essence):  What does your software do, how can it do it, and how do we test it sufficiently to know that it does it?

No silver bullet, indeed.

WHAT CODE GENERATION IS

A code generator is a tool that takes metadata as its input, merges the metadata with a template engine, and produces a series of source code files for its output.  The tool can be simple or elaborate, and you can generate any kind of code that you want.  You simply need to write the control program and templates for whatever you want to generate.

Code generation done well can save you some time in the long run (you have to invest effort in creating your generator) and increase quality because you know all generated code will be identical.  Any bugs you find in the code will be corrected once in the template.

One argument against code generation is that a data-driven subroutine can produce the same result as code generation.  I agree with this argument because the generator is a data-driven program.  Runtime reflection and good abstractions can produce the same results as code generation. I would argue, though, that this code is more complicated than the code created by the generator.  The generator might be as complex as the data-driven subroutine, but the code that is produced by the generator should be simple by design.  It would be trivially easy to attach a debugger and step over the generated code to find a bug.  I like debuggability.

Active vs. Passive

Generators come in two flavors:  Active and Passive.  Both are useful, but you must plan and design your project accordingly.

An active code generator maintains the code for the life of the project. Many active generators are invoked during the build process.  XDoclet is a good example of an active code generator.  I’ve used XDoclet to generate my webapp’s struts-config.xml file, and the generator was invoked by Ant during the build.  Another popular use of XDoclet is generating the boilerplate code and configurations for Enterprise Java Beans (EJBs).

Code generated by an active generator may or may not be checked into source control.  When invoked during a build and as part of the final artifact, generated code probably would not be in source control.  On the other hand, the output from an active code generator can be checked into source control and you could remove that step from the build process.  This isn’t to say the code is then maintained by hand!  On the contrary, the generator can be invoked frequently during a project.  The purpose of the active generator is to maintain the generated code.

A passive code generator creates code that you expect to maintain by hand afterwards.  Consider a wizard that asks you some questions before creating your basic class for you.  Likewise, many IDEs have useful generation snippet such as generating all your getters/setters from your class’ instance variables.  Both of these examples are simple yet extremely useful.  I would be continually frustrated if I had to write all my getters/setters by hand.

Passive code generators needn’t stop at simple IDE-level functionality.  Maven archetypes, for example, can create an entire project setup for you.  They create all your directories and starting pom.xml.  Depending on the archetype, this could be quite complex.

Similarly, you can create entire skeletal projects with functionality from a passive code generator.  One good example would be AppFuse, which creates your project structure, layout, build scripts, and can optionally create some basic functionality like user authentication.

IT’S JUST A TOOL

Always remember that code generation is a tool in your toolbox, nothing more.  More accurately, it’s a tool and die.

Every manufacturer has highly skilled workers creating dies, molds, and machine tools to create they parts they need.  Expert furniture makers don’t hand carve each and every table leg they require.  They make a jig and create exact copies of the table leg.  Each leg may be lovingly hand-checked for quality and assembled in the final table, but each leg certainly isn’t carved individually.

In the software world, there will be times when you need expert programmers writing templates and fewer junior engineers cranking out grunt code.  The experts make the tools and dies of our software world.

YOUR RESPONSIBILITY

If code generation is just a tool, then responsibility falls to the developer to understand when and how to use it.  It becomes the developer’s responsibility to create a design that does not require hand modification of any actively generated code. The design should be robust enough with plenty of hooks to allow for modification when needed.

One possible solution is to use active generation for base classes while using subclasses throughout the code.  The subclass could contain all the application-specific code needed, override base functionality as required, and leave the developer with a domain that could be easily regenerated while preserving all hand-written code.  Another design consideration is to model your application into a framework somewhat like Spring. Spring makes extensive use of the Template Method pattern and provides plenty of documented hooks for you to override when needed.

CONCLUSION

Code generation done well can increase quality and decrease costs in a project.  Time savings are compounded, too, when you find yourself implementing similar code across projects.  Each successive new project can benefit from the templates made in the last project.

Consistency across all generated code yields an easier learning curve because developers learn one standard way for basic functionality, leaving them to focus on the custom pieces of an application. Put another way, place as much functionality into the “accidental” realm as you can so that your developers can focus on the “essence.”  Generated code is easily understood and allows for better debuggability than runtime abstractions that produce the same effect.

There are very specific design considerations to be mindful of, particularly the need for a design to be robust enough to ensure hand-modification of actively generated code is not required.

Combine good active code generation with a library of common components and you will find yourself covering a large percentage of an application’s accidental complexity, leaving you more time to focus on the essence.

Code generation is a good tool for your toolbox.  An expert developer will understand when and how to use it effectively.

Be mindful of Collection.contains(obj)

Summary

All Collection.contains(obj) methods are not the same!

This article is a real world case study of the Big O differences between various implementations of Java’s Collection interface.   I found and fixed a grievous O(n^2) algorithm by using the right data structure.

Background

I was asked to investigate why some pages in our web application would save session data very quickly while another problem page would take literally tens of minutes. The application had at its core a Stateful Session Bean that held dirty objects which would be persisted to the database in a single transaction. Sure, the easy pages didn’t contain very much data to persist and we knew the problem page contains many times more data, but certainly not that much more data to cause 20 minute request times!

After I implemented the fix, the page elapsed time dropped from 20+ minutes to ~10 seconds. What did I do? I used the right data structure.

Data Structures and the Big O

The application used a Vector to store dirty objects. A Vector was used for two reasons: 1) the original engineers thought synchronization was important and 2) order was important for referential integrity. A Vector’s internal synchronization was unneeded because only a single user’s request thread ever access the application. The ordering, however, was extremely important because you couldn’t add a person’s data without first adding the person!

The problem page in the web app had to add thousands of rows of data to the database, hence there were thousands of dirty objects waiting in the cache for persistence. As the application created or dirtied objects, it checked its cache (the Vector) before adding it. You wouldn’t want the data to be persisted twice.

How did the app check its cache? vector.contains(obj);

The problem with vector.contains(obj) and list.contains(obj) is that they are O(n), which means they scale linearly. Put another way, it gets slower the more items you put into it. The page that created thousands of objects to persist got progressively slower with each object it created.

The solution was to switching to a LinkedHashSet which perserves order for referential integrity while providing O(1) performance for set.contains(obj) because all the objects are hashed.

The real problem was even worse, of course, because the app checked the cache each time before it added a new object.  This represents a good ol’ fashioned O(n^2) algorithm.

To be fair to the original developers, they wrote the application in Java 1.3 and LinkedHashSet was implemented in 1.4. Also, I don’t think they anticipated having a single page in the application generate thousands of objects.

Sample Code

Below is a simple program to highlight the performance differences between various Collection.contains(obj) methods

Elapsed times (in ms):

Vector: 3663
List: 3690
Set: 15
LinkedSet: 12

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
package mgt.perf;
 
import java.util.*;
 
public class ContainsExample {
 
    private int collectionCount = 10000;
    private int testCount = 50000;
 
    public static void main(String[] args) {
        new ContainsExample().start();
    }
 
    private void start() {
 
        Collection vector = new Vector();
        Collection list = new ArrayList();
        Collection set = new HashSet();
        Collection linkedSet = new LinkedHashSet();
 
        populate(vector);
        populate(list);
        populate(set);
        populate(linkedSet);
 
        System.out.println("Elapsed times\n");
        System.out.println("    Vector:" + test(vector));
        System.out.println("      List:" + test(list));
        System.out.println("       Set:" + test(set));
        System.out.println(" LinkedSet:" + test(linkedSet));
    }
 
    private void populate(Collection set) {
        for (int i = 0; i < collectionCount; i++) {
            set.add(i);
        }
    }
 
    private long test(Collection collection) {
        Random rnd = new Random(System.currentTimeMillis());
        long started = System.currentTimeMillis();
        for (int i = 0; i < testCount; i++) {
            int lookFor = rnd.nextInt(collectionCount);
            if (!collection.contains(lookFor)) {
                throw new IllegalStateException(lookFor + " really should be in the collection");
            }
        }
        long elapsed = System.currentTimeMillis() - started;
        return elapsed;
    }
 
}

HOW TO: Use mini-batching to improve grid performance

We achieved a 3.5X increase in throughput by implementing “mini-batching” in our grid-enabled jobs.

We have a parent BatchService that creates child Services where each individual Service is a unit of work.  A Service implementation might perform some calculation for a single employee of a large employer group.  When the individual Services are very fast and the cost of bussing them around the network is greater than the cost of processing the Service, then adding more consumers makes the BatchService run slower!  It is slower because these fine grained units of work require more queue locks, more network traffic, and more handling calls when the child Service is returned back to the parent BatchService for accumulation.

The secret, then, is to give each consumer enough work to make the overhead of bussing negligible.  That is, give each consumer a “mini-batch” of Services to run instead of sending just one Service to a consumer.

Here’s a graph of some of our benchmarks:

throughputbybatchsize.png

Some of the data surprised us.  For example, we expected 3 big batches to run fairly slowly across 11 consumers because there would be 8 consumers sitting idle, but we were not expecting 11 batches to run more slowly than 43 batches.  We thought dividing the work equally across consumers in the exact number of batches would be the lowest point on the graph.  We were wrong.  We expected the U-shape, but we thought the trough would be at a different batch size.

Our test system can only support up to 11 consumers, so we haven’t yet tested batch sizes with more than 11, but the graph implies that we’ll have a deeper trough when we add consumers and tweak the batch size.  There should be, in theory, a point where we can’t process jobs any faster due without killing the database.  I’ve warned our DBAs that we’re looking to hit that point.

If you’re doing any kind of grid computing (by way of Terracotta’s Master-Worker project, GridGain, or rolling your own), check out the effects mini-batching can have on your throughput.  You might be surprised by your benchmarking metrics!

HOW TO: Use JDBC Batching for 7-8X throughput gains

Using the batched statement capability of your JDBC driver can give you 7-8X throughput gains. Not only is batching significantly faster, it’ll save database CPU cycles and be easier on the network, too.

The graph below shows elapsed time (in milliseconds) by batch size. For each data point, 1K rows were inserted into a simple table in MySQL. The benchmarking code I used can be found here.

jdbc_batching_gains.png

Why is batching so much faster?

First, depending on how much PreparedStatement caching your driver is doing, your database may be spending a lot of time parsing and compiling statements. After the statement is parsed and compiled, bind variables are applied. In our example, the data base will parse and compile the statement once as opposed to 1,000 times. This reduces the work your database performs and saves CPU.

Second, all bind variables are passed to the database in a single network call instead of 1,000 separate out-of-process, across-the-network calls. This helps reduce network traffic.

Third, depending on the internal architecture of your code, single statements may return the connection to a pool after every use. Multiply that by 1,000 and run a profiler and you’ll see yourself calling take/put methods a lot. Many pools also verify the connection on check-in and check-out. “select 1 from dual” is a common check for a pool to use. Your 1,000 uses of a connection may also be incurring the cost of 2,000 “select 1 from dual” statements!

When should you use batching?

Batching is particularly useful in importing scenarios where you need to get lots of data into your application quickly, but it can be used even when executing a few similar statements. Check out the example source code provided to see if batching is right for you. Fiddle with the numbers to see the gains for batching just 10 similar statements. It may not be 8X big, but trumpeting 25% gains to management is still a win for you and your team.

Use JDBC Batching!

JDBC batching can give you dramatic throughput gains while simultaneously being less abusive to your hardware. Overall, if you have the opportunity to use batch inserts and updates, you should seize that opportunity. Look at your application’s internal architecture to see if batching is right for you.

“Don’t Make Me Think” applies to your code, too

Don’t make me think. That’s how I feel about your code.

Or as Martin Fowler puts it:

“Any fool can write code that a computer can understand. Good programmers write code that humans can understand.” -Martin Fowler, Refactoring: Improving the Design of Existing Code

You’ve reached a whole new level of mastery when you write for simplicity, elegance, and maintainability. This is done on purpose, and it’s hard to get right. Deadlines, schedules, pressure, and stress all encourage us to cut corners and adopt a “Git ‘er done!” mentality. But Abandonment of planning under pressure is one of software’s classic mistakes. It’s a cardinal sin.

How do you write simple and maintainable code? I’ve got a 3-step program for you:

Step 1: Admit that simple isn’t easy

Designing simple software is hard. It has to be done on purpose. You can’t accidentally find yourself with well-written code and an elegant solution, it has to be written that way on purpose.

This admission is a bedrock principle required for designing great software and products. If you can’t admit that simple is Hard Workâ„¢, you haven’t hit rock bottom yet by having to maintain code that would make readers of The Daily WTF blush.

Step 2: Read “Don’t Make Me Think”

Steve Krug’s excellent book “Don’t Make Me Think” is about website usability, yet it changed how I look at my code.

Why? Because Steve applied the same principles in his book to his book! And if it works in those two mediums, I thought it just might work for me, too, in my medium (code).

“Don’t Make Me Think” is very easily absorbed because he’s feeding you information in a readily accessible way. He wrote it simply on purpose, and I’m certain it took many more hours to edit than it did to write. Simple is hard.

Step 3: Practice simple everyday

There are innumerable decisions you make everyday that affect your project for better or worse. You need to recognize these as the opportunities they are. Here are a few things you can do every day:

  • Code in plain English. Use an active voice (just like writing). What do you think this method does?
  • dao.findCustomerBy(order);

    Or what about this if statement?

    if(admin.hasPermission(Permissions.VIEWFILE)){
       // allow...
    }

    or better yet…

    if(admin.hasViewFilePermission()){
       // allow...
    }

    The pretty method on the Admin class looks like this:

    public boolean hasViewFilePermission(){
       return hasPermission(Permissions.VIEWFILE);
    }

  • Make Stuff Obvious. Quick, what does this line of code do?
  • Date dt = march(28, 1973);

    When I’m reading through unit tests, I’d much rather see the above statement to create a date than the equivalent Java:

    Calendar cal = Calendar.getInstance();
    cal.set(Calendar.MONTH, Calendar.MARCH);
    cal.set(Calendar.DATE, 28);
    cal.set(Calendar.YEAR, 1973);
    Date dt = cal.getTime();

    You can find those convenient date methods here: dates.java (it’s Free software). Use Java 5’s static imports to make the short date seen above.

  • Be Merciless. Be your own worst critic when reviewing your code. Always strive to improve what you’ve written. Just as great essays and novels (and books like “Don’t Make Me Think”) require several rounds of editing, so too does your code.
  • Never nest ternary statements. ’nuff said.
  • Write comments, but be brief and explain why your code does what it does, not how it does it. We already know how it does it, we’re looking at the code.
  • That’s it. Three steps to better code. Putting it into practice won’t be easy, but if you want to be a master of your craft you’ll embrace the challenge and write things simply on purpose. The people who follow you and maintain your code will appreciate it.

    HOW TO: Better JavaScript Templates

    JavaScript Templates (Jst) is a pure Javascript templating engine that runs in your browser using JSP-like syntax. If that doesn’t sound familiar, check out the live working example on this site and download the code. It’s Free Open Source Software.

    Better JavaScript Templates

    HOW TO: Bootstrap Java programs in isolated classloaders

    Bootstrapping is the process by which you load a very small and very simple pure java program with no dependencies that, in turn, loads, configures, and runs more complex programs with varying dependencies. Bootstrapping lets you run your container without polluting the system classpath. This allows you to run your deployed applications with the unpolluted system classpath as its parent. You’ve achieved classloader isolation.

    When would you want to bootstrap? Any time you want an unpolluted system classpath, which I’m finding is often convenient.

    Let’s say you want to write some kind of middleware product, a container of some sort that deploys other applications within it. You will run into classloading issues. The dependencies that your container has (say, Spring 2.0.6) may not be what your deployed application requires (maybe, Spring 1.2.6). You will find that you cannot have commons-logging in both applications (container and child). There are many ways to encounter java.lang.LinkageErrors. It’s very easy to cross the streams when running in a mutli-app environment.

    What you want to do is load your container and deployed apps in splendid isolation from each other. How do you do that? Bootstrapping!

    Here’s how you bootstrap…

    
    import java.io.File;
    import java.lang.reflect.Method;
    import java.net.URL;
    import java.net.URLClassLoader;
    import java.util.ArrayList;
    import java.util.List;
    
    public class Bootstrap {
    
        public static void main(String[] args) throws Exception {
    
            /*
                Assume your application has a "home" directory
                with /classes and /lib beneath it, where you can put
                loose files and jars.
    
                Thus,
    
                /usr/local/src/APP
                /usr/local/src/APP/classes
                /usr/local/src/APP/lib
             */
    
            String HOME = "/usr/local/src/YOURAPP";
            String CLASSES = HOME + "/classes";
            String LIB = HOME + "/lib";
    
            // add the classes dir and each jar in lib to a List of URLs.
            List urls = new ArrayList();
            urls.add(new File(CLASSES).toURL());
            for (File f : new File(LIB).listFiles()) {
                urls.add(f.toURL());
            }
    
            // feed your URLs to a URLClassLoader!
            ClassLoader classloader =
                    new URLClassLoader(
                            urls.toArray(new URL[0]),
                            ClassLoader.getSystemClassLoader().getParent());
    
            // relative to that classloader, find the main class
            // you want to bootstrap, which is the first cmd line arg
            Class mainClass = classloader.loadClass(args[0]);
            Method main = mainClass.getMethod("main",
                    new Class[]{args.getClass()});
    
            // well-behaved Java packages work relative to the
            // context classloader.  Others don't (like commons-logging)
            Thread.currentThread().setContextClassLoader(classloader);
    
            // you want to prune the first arg because its your main class.
            // you want to pass the remaining args as the "real" args to your main
            String[] nextArgs = new String[args.length - 1];
            System.arraycopy(args, 1, nextArgs, 0, nextArgs.length);
            main.invoke(null, new Object[] { nextArgs });
        }
    
    }
    
    

    You can try this code out for yourself. Cut & paste the bootstrap code above into your favorite IDE, put that single Bootstrap.class onto your classpath, and run it like so:

    
    java -cp . Bootstrap sample.HelloWorldMain Hello!
    .
    

    Click here to download the sample /usr/local/src/YOURAPP application.
    Tip for Windows users, you can make the path c:\usr\local\src\YOURAPP it’ll work.

    Tags: , ,

    Switch to our mobile site