Performance tuning an Ant build

Common advice in the Agile world is to maintain an automated build that runs in under 10 minutes. I doubt anybody would disagree that a faster build is better than a slower one. But how do we keep the build slick?

Usually the bulk of the build time is spend in executing tests, so that is the first place to start. But as with any optimization effort, you shouldn’t guess where the pain is, but measure. So how do we do that for an Ant build?

This turns out to be not hard at all. Ant will happily inform your listener of any interesting event, such as task started or stopped. Using that information, it is pretty straightforward to write a listener that records the time the tasks and targets take.

But you don’t even have to do that, you can simply use the open source Ant Utilities project. Simply place the jar in Ant’s lib directory and run Ant as follows:

ant -listener net.java.antutility.BuildMetricsListener
[target]

At the end of the Ant build, a profile report will be displayed:

...
BUILD SUCCESSFUL
Total time: 4 minutes 6 seconds
BUILD METRICS:
Local Time, Child Time, Invocation Count, Type, Name, Location
88453, 0, 1, TASK, macker, build-core.xml:448:
70955, 0, 8, TASK, javac, dist\build.xml:608:
36563, 0, 6, TASK, jar, dist\build.xml:628:
31047, 0, 1, TASK, checkstyle, build-core.xml:251:
4031, 0, 1, TASK, exec, dist\build.xml:947:
1922, 0, 1, TASK, taskdef, build-core.xml:345:
1797, 0, 1, TASK, signjar, build-core.xml:233:
1688, 0, 1, TASK, uptodate, build-core.xml:425:
1018, 1557, 21, TASK, for, build.xml:19:
688, 0, 1, TASK, taskdef, build-core.xml:404:
609, 0, 1, TASK, property, build-core.xml:171:
533, 0, 21, TASK, taskdef, dist\build.xml:15:

The first column indicates the time spent in the element (task or target), the last two the name of the element and the build file name and line number.

But it gets better still. You can also run your Ant build from Eclipse and get a visual indication of the pain points in your editor:
Visual indication of slow Ant elements in Eclipse

Advertisement

Performance tuning a GWT application

With Google Web Toolkit (GWT), you write your AJAX front-end in the Java programming language which GWT then cross-compiles into optimized JavaScript that automatically works across all major browsers.

…claims Google. And I must say, I’m pretty impressed by the ease of development GWT offers. I’ve used it at work on a project that I probably couldn’t have done without it, given my poor JavaScript skills. Especially the fact that you don’t have to worry about whether your code works in all browsers is great.

There are a couple of caveats, however:

  • The set of supported Java classes is limited, which sometimes causes confusion. For instance, there is a Character class, but the isWhitespace() method is not supported. And neither is List.subList().
  • Serialization works differently from the Java standard.
  • You can’t work with regular W3C DOM objects on the client. Instead, GWT provides it’s own DOM hierarchy.
  • Even though Google claims that GWT “allows developers to quickly build and maintain complex yet highly performant JavaScript front-end applications in the Java programming language”, performance can be a problem.

The purpose of this post is to elaborate on that last point.

Java isn’t JavaScript

Most developers evolve a sense of intuition about what type of code can be a performance problem, and what not. The problem with GWT development, however, is that even though you write Java code, the browser executes JavaScript code. So any intuitions about Java performance are misleading.

Therefore, your usual rules of thumb won’t work in GWT development. Here’s a couple that may.

Serialization is slow

Our application created a domain model on the server, serialized that to the client, and had the client render it. The problem was that the model could become quite large, consisting of thousands of objects. This is not something that GWT currently handles well. Serialization performance appears to be proportional to the number of objects, not just to their total size. Therefore, we translated the model to XML, and serialized that as a single string. This was way faster.

However, this meant that we needed to parse the XML on the client side to reconstruct our model. GWT provides an XMLParser class to handle just that. This class is very efficient in parsing XML, but it turned out that traversing the resulting DOM document was still too slow.

So I wrote a dedicated XML parser, one that can only parse the subset of XML documents that represent our domain model. This parser builds the domain model directly, without intermediate representation. This proved to be faster than the generic approach, but only after being very careful with handling strings.

Some string handling is slow, particularly in Internet Explorer

Java developers know that string handling can be a performance bottleneck. This is also true for GWT development, but not in quite the same way. For instance, using StringBuffer or StringBuilder is usually sufficient for improving String handling performance in Java code. But not so in JavaScript. StringBuffer.append() can be very slow on Internet Explorer, for instance.

GWT 1.6 will contain its own version of StringBuffer that will alleviate some of these problems. Since 1.6 isn’t released yet, we just copied this class into our project.

But even with the new StringBuffer class, you still need to be careful when dealing with strings. Some of the StringBuffer methods are implemented by calling toString() and doing something on the result. That can be a real performance killer. So anything you can do to stay away from substring(), charAt(), etc. will help you. This can mean that it’s sometimes better to work with plain Strings instead of StringBuffers!

Tables

For displaying data in a table-like format, you can use the Grid widget. This is not terribly fast, however, so you may want to consider the bulk table renderers.

The downside of these is that they translate any widgets you insert into the table to HTML, and you loose all the functionality to attached to them, like click handling. Instead, you can add a TableListener, that has a onCellClicked() method.

Varargs are slow

Variable argument lists are sometimes quite handy. However, we’ve found that they come at a severe performance penalty, because a JavaScript array needs to be created to wrap the arguments. So if at all possible, use fixed argument lists, even though this may be less convenient from a code writing/maintaining perspective.

Don’t trust rules of thumb, use a profiler

The problem with rules of thumb is that they are just that: principles with broad application that are not intended to be strictly accurate or reliable for every situation. You shouldn’t put all your trust in them, but measure where the pain really is. This means using a profiler.

You could, of course, run your favorite Java profiler against hosted mode to get a sense of performance of your code. But that would be besides the point. Your Java code is compiled to JavaScript and it is the JavaScript code that gets executed by the browser. So you should use a JavaScript profiler.

We used Firebug to profile the compiled JavaScript code and this helped us enormously. As usual with profiling, we found performance bottlenecks that we didn’t anticipate. As a result, we were able to make our application load over 60 times faster! (on Internet Explorer 7)

Performance in different browsers

The only problem with Firebug is that it’s a FireFox plugin, and therefore not available for Internet Explorer. [Firebug Lite doesn’t contain a profiler.] Not that I personally would want to use IE, but our customers do, unfortunately.

The irony is that you need a JavaScript profiler in Internet Explorer the most: FireFox is unbelievably much faster than Internet Explorer when it comes to processing JavaScript, especially with string handling. For instance, for one data set, FireFox 3 loaded the page in 51 seconds, while Internet Explorer 7 took 12 minutes and 4 seconds!

Internet Explorer 8 will supposedly be better:

We have made huge improvements to widely-used JScript functionality including faster string, array, and lookup operations.

We’ll have to see how that works out when IE8 is released…

BTW, if you’re interested in browser performance, check out this comparison.

Breaking Encapsulation

Last week, we tested the upgrade procedure for the new version of our product. We got a backup from one of our clients that was over 60Gb, so we could put it to good use by testing the performance of the upgrade against it. This sort of testing is always crucial for making sure the upgrade won’t disrupt production too much.

One of the steps in the upgrade was the deletion of stale data. It dealt with two entities in a one-to-many relationship. For this discussion, lets call these entities A and B. For each A, there can be multiple Bs, whereas each B is associated with exactly one A. The upgrade used our product’s API to select A objects matching the required criteria, and then deleting them. The API implementation makes sure that when an A object is deleted, its B objects are also deleted.

This is standard encapsulation practice, nothing fancy. But there was one problem with it: the deletion process was way too slow. We broke it off after over three and a half hours, which is clearly unacceptable.

So we turned to the code, and found two loops: one iterating over the A objects, and within the delete() of A, one iterating over the B objects. Since there can be many, many B objects to search through, this inner loop really hurts when executed repeatedly. We say that this algorithm is O(n×m), where n is the number of A objects and m the number of B objects. By first deleting all B objects related to A objects that match the criteria, and only then deleting the A objects, we could potentially change the algorithm to O(n+m), which of course is much faster.

That didn’t work out, though, since the delete() method in class A still contained the loop over B objects, even though we now knew for sure that none of the B objects would match (since we deleted them previously). So we broke encapsulation by extracting a doDelete() method that just deletes the A object, nothing more.

We had a similar problem with B’s delete() method. This code sends a notification to its A object and performs other housekeeping. In our situation, this is clearly unnecessary, since that A object is about to be deleted as well. So we again broke encapsulation and extracted a doDelete() method for class B as well.

Now we had the performance we required: the deletion process was down to two minutes. But we lost encapsulation. Being well-experienced in object oriented techniques, we knew that would open the door to all sorts of trouble. But we also knew that this change was absolutely necessary to get the required performance.

So we went into damage control mode. We made the doDelete() methods protected, and moved the upgrade code to the same package as the API implementation code, to still be able to call the doDelete()s. Still not optimal, but sometimes a man’s got to do what a man’s got to do…

Importing large data sets

For performance testing, it is often necessary to import a large data set to test against. However, importing large data sets presents its own challenges. Below I want to give some tips on how to deal with those.

  1. Begin with making backups. Not just of your current data, but also of the large data set you want to import. You might just want to transform the data to import, and then it is useful to be able to go back to the original.
  2. Start with a representative subset of the large data set. This will allow you to test the import process without having to wait hours for feedback. Only when you’re convinced that everything works as expected, do you import the whole large data set.
  3. Test the limited data set end-to-end. For instance, the product I’m currently working on consists of a Content Management System (CMS, where people author content) and a Delivery System (DS, where people use the content). Data is imported into the CMS, edited, and finally published to the DS. In this situation, it is not enough to have a successful import into CMS. The publication to DS must also succeed.
  4. Automate the import. When things go wrong, you need to perform the import multiple times. It saves time to be able to run the import with a single command. Even if the import succeeds on the first try (one can dream), you might want to redo the import later, e.g. for performance testing against a new release, or when a new, even larger, data set becomes available.
  5. If you need to transform the data to make the import work, make sure to put the transformation scripts under version control, like your regular code (you do use a version control system, do you?). The build scripts that automate the import should also be put under version control.
  6. If you cannot get your hands on real-world data, you may still be able to do performance testing using generated data. The downside of this approach is that the generated data will probably not contain the exotic border cases that are usually present in real-life data.