Performance and TDD

TDD works wonders for developing code that meets functional requirements. But what about non-functional requirements? Let’s take a look at one that most users care about: performance.

Most TDD examples are necessarily small, so that the author can see the process through to completion. That reduces the chances for intricacies like performance aspects. But it’s not impossible. Let’s try this with the WordWrap kata, where we have to ensure lines aren’t too long by inserting newline characters in appropriate places.

As usual, we start with input validation:

class WhenWrappingWords {

    @Test
    void shouldHandleEmptyText() {
        assertWrap(null, "");
    }

    private void assertWrap(String text, String expected) {
        assertThat(text, Wrapper.wrap(text, 5), is(expected));
    }

}

Which is easy enough:

public class Wrapper {

    public static String wrap(String text, int length) {
         return "";
    }

}

Next, the degenerate case, where the text doesn’t require newlines:

    @Test
    void shouldNotWrapShortText() {
        assertWrap("short", "short");
    }

    public static String wrap(String text, int length) {
        if (text == null) {
             return "";
        }
        return text;
    }

Now we get to the meat: longer texts require wrapping:

    @Test
    void shouldWrapLongText() {
        assertWrap("toolong", "toolo\nng");
    }

    private static final char NL = '\n';

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        if (text.length() <= length) {
            return text;
        }
        return text.substring(0, length) + NL 
            + text.substring(length);
    }

But, if possible, we should wrap at word boundaries rather than in the middle of a word:

    @Test
    void shouldPreferToWrapAtWhitespace() {
        assertWrap("too long", "too\nlong");
    }

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        if (text.length() <= length) {
            return text;
        }
        var index = text.lastIndexOf(' ', length);
        if (index < 0) {
            return text.substring(0, length) + NL
                + text.substring(length);
        }
        return text.substring(0, index) + NL
            + text.substring(index + 1);
    }

And finally, we should wrap into multiple lines if needed:

    @Test
    void shouldWrapVeryLongTextMultipleTimes() {
        assertWrap("toolongtext", "toolo\nngtex\nt");
        assertWrap("too long text", "too\nlong\ntext");
    }

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        if (text.length() <= length) {
            return text;
        }
        var index = text.lastIndexOf(' ', length);
        if (index < 0) {
            return text.substring(0, length) + NL 
                + wrap(text.substring(length), length);
        }
        return text.substring(0, index) + NL 
            + wrap(text.substring(index + 1), length);
    }

Which we can clean up a bit:

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        if (text.length() <= length) {
            return text;
        }
        var index = text.lastIndexOf(' ', length);
        var skip = 1;
        if (index < 0) {
            index = length;
            skip = 0;
        }
        return text.substring(0, index) + NL 
            + wrap(text.substring(index + skip), length);
    }

Now let’s consider the performance of this code. Can we use it to format a book? A novel has around 100,000 words and an English word consists of 5.1 letters on average. Let’s say we want to wrap lines at 80 characters:

    private static final int NUM_WORDS_IN_BOOK = 100_000;
    private static final float AVG_NUM_CHARS_PER_WORD = 5.1f;
    private static final int MAX_BOOK_LINE_LENGTH = 80;
    private static final int NUM_TRIES = 10;
    private static final float MAX_WRAPPING_MS = 1000;

    private final Random random = new SecureRandom();

    @Test
    void shouldWrapBook() {
        var time = 0;
        for (var i = 0; i < NUM_TRIES; i++) {
            var text = randomStringOfBookLength();
            var start = System.currentTimeMillis();
            Wrapper.wrap(text, MAX_BOOK_LINE_LENGTH);
            var stop = System.currentTimeMillis();
            time += stop - start;
        }
        assertThat(1.0f * time / NUM_TRIES, 
            lessThanOrEqualTo(MAX_WRAPPING_MS));
    }

    private String randomStringOfBookLength() {
        var numCharsInBook = (int) (NUM_WORDS_IN_BOOK * ( 1 + AVG_NUM_CHARS_PER_WORD));
        var result = new StringBuilder(numCharsInBook);
        for (var i = 0; i < numCharsInBook; i++) {
            result.append(randomChar());
        }
        return result.toString();
    }

    private char randomChar() {
        if (random.nextFloat() < 1.0 / (1 + AVG_NUM_CHARS_PER_WORD)) {
            return ' ';
        }
        return (char) (random.nextInt(26) + 'a');
    }

Normally, you’d use the Java Microbenchmark Harness to investigate the performance of an algorithm like this. I don’t want to introduce new tech for this already long post, however, so this test will have to do. Note that we have to run multiple tries, since we’re using randomness.

Running this test gives a stack overflow, so clearly we need to do something about that.

In this case, it would be easy to replace the recursion with a while loop, so we could just go do that and see the test pass. In the real world, however, things usually aren’t that simple.

This is where the Strategy pattern can come in handy. With multiple implementations of the strategy interface, we can run our tests against all of them. We can develop alternative implementations from scratch, using TDD, or copy some code into a new implementation and start modifying it. Once we’re satisfied with the results, we can keep the best implementation and remove the others.

But hang on, we used TDD to get to this implementation, so how is doing that again going to give us a different result?

Well, when we did it the first time, we weren’t focused on performance. We shouldn’t have been, since premature optimization is the root of all evil. Now that we have proof that our performance isn’t good enough, things are different. Let’s see how that plays out.

The implementation of the first two tests can remain the same:

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        return text;
    }

To make shouldWrapLongText() pass, we need to pay more attention to performance this time. We don’t want to use substring() and add two Strings together, since that involves copying characters. So let’s use a StringBuilder instead:

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        var result = new StringBuilder(text);
        if (result.length() > length) {
            result.insert(length, NL);
        }
        return result.toString();
    }

This still means we have to copy some arrays around to make room for the newline. We can avoid that by allocating enough capacity from the start:

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        var capacity = text.length() + text.length() / length;
        var result = new StringBuilder(capacity);
        result.append(text);
        if (result.length() > length) {
            result.insert(length, NL);
        }
        return result.toString();
    }

This would normally be looking ahead a bit too much for my taste, but since we already implemented the algorithm once, we know for sure we’re going to need this, so I’m cool with it.

Next let’s make shouldPreferToWrapAtWhitespace() pass:

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        var result = new StringBuilder(text.length() + text.length() / length);
        result.append(text);
        if (result.length() > length) {
            var spaceIndex = text.lastIndexOf(' ', length);
            if (spaceIndex < 0) {
                result.insert(length, NL);
            } else {
                result.setCharAt(spaceIndex, NL);
            }
        }
        return result.toString();
    }

Finally, we can generalize the if to a while to make the last test pass:

    public static String wrap(String text, int length) {
        if (text == null) {
            return "";
        }
        var capacity = text.length() + text.length() / length;
        var result = new StringBuilder(capacity);
        result.append(text);
        var columnEnd = length;
        while (columnEnd < result.length()) {
            var spaceIndex = result.lastIndexOf(" ", columnEnd);
            if (spaceIndex < columnEnd - length) {
                result.insert(columnEnd, NL);
                columnEnd += length + 1;
            } else {
                result.setCharAt(spaceIndex, NL);
                columnEnd = spaceIndex + 1 + length;
            }
        }
        return result.toString();
    }

This passes all our tests, including the one about performance.

The above do-over may seem like a wasteful approach: why wouldn’t we do it “right” from the start? Like I said earlier, we didn’t because we didn’t know that our implementation wasn’t going to perform well. But what if we did know from the start that performance was important?

We could’ve written our tests in a different order, tackling the test for performance earlier in the process. That would’ve prevented us from getting to green with recursion in this example, saving us a bit of time. In a real-world scenario, it might have saved a lot more time. Yet again, we see that the order of tests is important.

I would argue, however, that not much time was lost with the initial approach. I still believe that the proper order is make it pass, make it right, make it fast. One of the reasons TDD works so well is the explicit distinction between making the test green and then refactoring. Doing one thing at a time is solid advice when it comes to addressing performance as well.

I’ll accept a little bit of rework, knowing that I’ll win back that time and more in all the cases where the “right” solution is also fast enough and I don’t waste time on premature optimization.

Performance tuning an Ant build

Common advice in the Agile world is to maintain an automated build that runs in under 10 minutes. I doubt anybody would disagree that a faster build is better than a slower one. But how do we keep the build slick?

Usually the bulk of the build time is spend in executing tests, so that is the first place to start. But as with any optimization effort, you shouldn’t guess where the pain is, but measure. So how do we do that for an Ant build?

This turns out to be not hard at all. Ant will happily inform your listener of any interesting event, such as task started or stopped. Using that information, it is pretty straightforward to write a listener that records the time the tasks and targets take.

But you don’t even have to do that, you can simply use the open source Ant Utilities project. Simply place the jar in Ant’s lib directory and run Ant as follows:

ant -listener net.java.antutility.BuildMetricsListener
[target]

At the end of the Ant build, a profile report will be displayed:

...
BUILD SUCCESSFUL
Total time: 4 minutes 6 seconds
BUILD METRICS:
Local Time, Child Time, Invocation Count, Type, Name, Location
88453, 0, 1, TASK, macker, build-core.xml:448:
70955, 0, 8, TASK, javac, dist\build.xml:608:
36563, 0, 6, TASK, jar, dist\build.xml:628:
31047, 0, 1, TASK, checkstyle, build-core.xml:251:
4031, 0, 1, TASK, exec, dist\build.xml:947:
1922, 0, 1, TASK, taskdef, build-core.xml:345:
1797, 0, 1, TASK, signjar, build-core.xml:233:
1688, 0, 1, TASK, uptodate, build-core.xml:425:
1018, 1557, 21, TASK, for, build.xml:19:
688, 0, 1, TASK, taskdef, build-core.xml:404:
609, 0, 1, TASK, property, build-core.xml:171:
533, 0, 21, TASK, taskdef, dist\build.xml:15:

The first column indicates the time spent in the element (task or target), the last two the name of the element and the build file name and line number.

But it gets better still. You can also run your Ant build from Eclipse and get a visual indication of the pain points in your editor:
Visual indication of slow Ant elements in Eclipse

Performance tuning a GWT application

With Google Web Toolkit (GWT), you write your AJAX front-end in the Java programming language which GWT then cross-compiles into optimized JavaScript that automatically works across all major browsers.

…claims Google. And I must say, I’m pretty impressed by the ease of development GWT offers. I’ve used it at work on a project that I probably couldn’t have done without it, given my poor JavaScript skills. Especially the fact that you don’t have to worry about whether your code works in all browsers is great.

There are a couple of caveats, however:

The set of supported Java classes is limited, which sometimes causes confusion. For instance, there is a Character class, but the isWhitespace() method is not supported. And neither is List.subList().
Serialization works differently from the Java standard.
You can’t work with regular W3C DOM objects on the client. Instead, GWT provides it’s own DOM hierarchy.
Even though Google claims that GWT “allows developers to quickly build and maintain complex yet highly performant JavaScript front-end applications in the Java programming language”, performance can be a problem.

The purpose of this post is to elaborate on that last point.

Java isn’t JavaScript

Most developers evolve a sense of intuition about what type of code can be a performance problem, and what not. The problem with GWT development, however, is that even though you write Java code, the browser executes JavaScript code. So any intuitions about Java performance are misleading.

Therefore, your usual rules of thumb won’t work in GWT development. Here’s a couple that may.

Serialization is slow

Our application created a domain model on the server, serialized that to the client, and had the client render it. The problem was that the model could become quite large, consisting of thousands of objects. This is not something that GWT currently handles well. Serialization performance appears to be proportional to the number of objects, not just to their total size. Therefore, we translated the model to XML, and serialized that as a single string. This was way faster.

However, this meant that we needed to parse the XML on the client side to reconstruct our model. GWT provides an XMLParser class to handle just that. This class is very efficient in parsing XML, but it turned out that traversing the resulting DOM document was still too slow.

So I wrote a dedicated XML parser, one that can only parse the subset of XML documents that represent our domain model. This parser builds the domain model directly, without intermediate representation. This proved to be faster than the generic approach, but only after being very careful with handling strings.

Some string handling is slow, particularly in Internet Explorer

Java developers know that string handling can be a performance bottleneck. This is also true for GWT development, but not in quite the same way. For instance, using StringBuffer or StringBuilder is usually sufficient for improving String handling performance in Java code. But not so in JavaScript. StringBuffer.append() can be very slow on Internet Explorer, for instance.

GWT 1.6 will contain its own version of StringBuffer that will alleviate some of these problems. Since 1.6 isn’t released yet, we just copied this class into our project.

But even with the new StringBuffer class, you still need to be careful when dealing with strings. Some of the StringBuffer methods are implemented by calling toString() and doing something on the result. That can be a real performance killer. So anything you can do to stay away from substring(), charAt(), etc. will help you. This can mean that it’s sometimes better to work with plain Strings instead of StringBuffers!

Tables

For displaying data in a table-like format, you can use the Grid widget. This is not terribly fast, however, so you may want to consider the bulk table renderers.

The downside of these is that they translate any widgets you insert into the table to HTML, and you loose all the functionality to attached to them, like click handling. Instead, you can add a TableListener, that has a onCellClicked() method.

Varargs are slow

Variable argument lists are sometimes quite handy. However, we’ve found that they come at a severe performance penalty, because a JavaScript array needs to be created to wrap the arguments. So if at all possible, use fixed argument lists, even though this may be less convenient from a code writing/maintaining perspective.

Don’t trust rules of thumb, use a profiler

The problem with rules of thumb is that they are just that: principles with broad application that are not intended to be strictly accurate or reliable for every situation. You shouldn’t put all your trust in them, but measure where the pain really is. This means using a profiler.

You could, of course, run your favorite Java profiler against hosted mode to get a sense of performance of your code. But that would be besides the point. Your Java code is compiled to JavaScript and it is the JavaScript code that gets executed by the browser. So you should use a JavaScript profiler.

We used Firebug to profile the compiled JavaScript code and this helped us enormously. As usual with profiling, we found performance bottlenecks that we didn’t anticipate. As a result, we were able to make our application load over 60 times faster! (on Internet Explorer 7)

Performance in different browsers

The only problem with Firebug is that it’s a FireFox plugin, and therefore not available for Internet Explorer. [Firebug Lite doesn’t contain a profiler.] Not that I personally would want to use IE, but our customers do, unfortunately.

The irony is that you need a JavaScript profiler in Internet Explorer the most: FireFox is unbelievably much faster than Internet Explorer when it comes to processing JavaScript, especially with string handling. For instance, for one data set, FireFox 3 loaded the page in 51 seconds, while Internet Explorer 7 took 12 minutes and 4 seconds!

Internet Explorer 8 will supposedly be better:

We have made huge improvements to widely-used JScript functionality including faster string, array, and lookup operations.

We’ll have to see how that works out when IE8 is released…

BTW, if you’re interested in browser performance, check out this comparison.

Breaking Encapsulation

Last week, we tested the upgrade procedure for the new version of our product. We got a backup from one of our clients that was over 60Gb, so we could put it to good use by testing the performance of the upgrade against it. This sort of testing is always crucial for making sure the upgrade won’t disrupt production too much.

One of the steps in the upgrade was the deletion of stale data. It dealt with two entities in a one-to-many relationship. For this discussion, lets call these entities A and B. For each A, there can be multiple Bs, whereas each B is associated with exactly one A. The upgrade used our product’s API to select A objects matching the required criteria, and then deleting them. The API implementation makes sure that when an A object is deleted, its B objects are also deleted.

This is standard encapsulation practice, nothing fancy. But there was one problem with it: the deletion process was way too slow. We broke it off after over three and a half hours, which is clearly unacceptable.

So we turned to the code, and found two loops: one iterating over the A objects, and within the delete() of A, one iterating over the B objects. Since there can be many, many B objects to search through, this inner loop really hurts when executed repeatedly. We say that this algorithm is O(n×m), where n is the number of A objects and m the number of B objects. By first deleting all B objects related to A objects that match the criteria, and only then deleting the A objects, we could potentially change the algorithm to O(n+m), which of course is much faster.

That didn’t work out, though, since the delete() method in class A still contained the loop over B objects, even though we now knew for sure that none of the B objects would match (since we deleted them previously). So we broke encapsulation by extracting a doDelete() method that just deletes the A object, nothing more.

We had a similar problem with B’s delete() method. This code sends a notification to its A object and performs other housekeeping. In our situation, this is clearly unnecessary, since that A object is about to be deleted as well. So we again broke encapsulation and extracted a doDelete() method for class B as well.

Now we had the performance we required: the deletion process was down to two minutes. But we lost encapsulation. Being well-experienced in object oriented techniques, we knew that would open the door to all sorts of trouble. But we also knew that this change was absolutely necessary to get the required performance.

So we went into damage control mode. We made the doDelete() methods protected, and moved the upgrade code to the same package as the API implementation code, to still be able to call the doDelete()s. Still not optimal, but sometimes a man’s got to do what a man’s got to do…

Importing large data sets

For performance testing, it is often necessary to import a large data set to test against. However, importing large data sets presents its own challenges. Below I want to give some tips on how to deal with those.

Begin with making backups. Not just of your current data, but also of the large data set you want to import. You might just want to transform the data to import, and then it is useful to be able to go back to the original.
Start with a representative subset of the large data set. This will allow you to test the import process without having to wait hours for feedback. Only when you’re convinced that everything works as expected, do you import the whole large data set.
Test the limited data set end-to-end. For instance, the product I’m currently working on consists of a Content Management System (CMS, where people author content) and a Delivery System (DS, where people use the content). Data is imported into the CMS, edited, and finally published to the DS. In this situation, it is not enough to have a successful import into CMS. The publication to DS must also succeed.
Automate the import. When things go wrong, you need to perform the import multiple times. It saves time to be able to run the import with a single command. Even if the import succeeds on the first try (one can dream), you might want to redo the import later, e.g. for performance testing against a new release, or when a new, even larger, data set becomes available.
If you need to transform the data to make the import work, make sure to put the transformation scripts under version control, like your regular code (you do use a version control system, do you?). The build scripts that automate the import should also be put under version control.
If you cannot get your hands on real-world data, you may still be able to do performance testing using generated data. The downside of this approach is that the generated data will probably not contain the exotic border cases that are usually present in real-life data.