Root Cause Analysis

I’ve moved on to a new project recently. It’s quite different from the previous one. Before I worked on a monolythic web application, now we’re using OSGi. As a result, our project consists of a lot of sub-projects (OSGi bundles) which makes it very inconvenient to use Ant. So we’ve switched to Gradle. Our company has also standardized on Perforce, where we used Subversion before.

To summarize, a lot has changed and I’m not really up to speed yet. This is evidenced by the fact that I broke our build 4 times in 5 days. As an aspiring software craftsman, I feel really bad about that. So why did this happen so often? And can I do anything to prevent it from happening again? Enter Root Cause Analysis.

Root cause analysis is performed to not just treat the symptoms, but cure the disease:

The key to effective problem solving is to first make sure you understand the problem that you are
trying to solve – why it needs to be solved, how you will know when you’ve solved it, and what the
root cause is.
Often symptoms show up in one place while the actual cause of the problem is somewhere
completely different. If you just “solve” the symptom without digging deeper it is highly likely that
problem will just reappear later in a different shape.

Henrik Kniberg has written about one way of doing root cause analysis: using Cause-Effect diagrams. Using this method, I ended up with the following:

Build Failure Cause Effect Diagram

I started out with the problem: Build Failure, in the orange rectangle. I then repeatedly asked myself why this is a bad thing and added the effects in the red rectangles. Just repeat this until you find something that conflicts with your goal. Finally, I repeatedly added causes in blue rectangles. You can stop when you’ve found something you can fix, but in general it’s good to keep asking a bit deeper than feels comfortable. This is where the Five Whys technique comes in handy.

As you can see, my main problem is impatience. For those who know me, that won’t come as a surprise 😉 However, in this case this personal flaw of mine gets in the way of my goal of making customers happy.

With the causes identified, it’s time to think up some countermeasures. Pick some causes that you can fix, and think of a way to treat them. The ones I’m going to work on are marked with a star in the figure above:

  • Use a checklist when submitting code to the source code repository. This will prevent me from making silly mistakes, such as forgetting to add a new file
  • Take the time to learn the tools better, in particular Gradle and Groovy
  • In general, try to be more patient

The last one is the hard one, of course. Wish me luck!

Update 2010-08-01: I have created a checklist with things to consider before submitting code to our Perforce repository and used this checklist all week. I broke the build twice this week, both times because of special circumstances that were not accounted for on the checklist. So I think I’m improving, but I’m not there yet.

Advertisements

JavaFX plugin for Eclipse patched

A while ago I wrote about how the JavaFX Eclipse plugin has some shortcomings. Luckily, the plugin is released under an Open Source license (BSD). Therefore, the source is available, and anyone can fix problems and supply patches.

So I decided to do just that, and checked out the code from the Subversion repository. I followed the steps described in the Wiki to get the project compiled.

The first thing I ran into, is that the default target didn’t exist. That was easy enough to fix.

Next, it bothered me that I needed to provide several properties on the command line. For instance I needed to specify -DeclipseDir=/opt/eclipse every time. So I patched the build to get this location from the environment variable ECLIPSE_HOME which I set in my .profile.

The same goes for the location of the JavaFX SDK. I introduced a JAVAFX_HOME environment variable to store that location. With these three modifications I could finally issue a build with just a simple ant.

With the build working it was time to tackle one of the problems I encountered during my usage of the plug-in. I figured it would be easy to fix the issue where compilation errors use up three lines in the Problems view, so that’s where I started. Based on information in the book Eclipse, Building Commercial-Quality Plug-ins, I knew I had to look for IMarker. There was only one such place in the code, so the problem was easily fixed. Progress at last!

I took a look at the other issues that were reported against the plug-in. There might be easier ones than the ones I experienced 😉 Indeed, someone noticed that the JavaFX perspective missed an icon. At first I couldn’t reproduce it, but that was because I was starting the plug-in from Eclipse. The official distribution did show the problem. Luckily, the bug reporter also found the cause: the icon used wasn’t part of the jar. A simple addition to build.properties fixed that.

Five patches already! I was feeling pretty good about myself 🙂

Next, someone wanted help with import statements. That was a lot trickier, but probably also a lot more valuable. Having looked at the IMarker code before, I naturally wanted to add a marker resolution.

This turned out to be a lot more work than I anticipated, but I managed to get something working. There were some glitches, and it probably needed a lot more testing for corner cases, but I could add a missing import statement based on the class name. Because I felt that this code wasn’t ready for prime time yet, I didn’t supply a patch, though.

This patching frenzy took place during the holidays. I got no response from Sun during that time, but I guess that was to be expected, given the time of year. So I decided to wait until some time into the new year to see what would happen.

Monday January 5 went by without news, but this morning I started to receive a stream of email notifications about issue tracker updates. Several of my patches were being accepted, and then I even received notification that I was given commit access! Such are the wonders of Open Source…

Automated distribution creation (4)

In this series of posts, I talked about my continuing quest for the fully automated creation of a distribution for our product. I talked about downloading release notes from our issue tracker and how to add those to our NEWS file. Last time, I had the build automatically update some text files. Now, my journey comes to an end.

The final manual step in creating a distribution for our product, is about incorporating the latest version of the manual. Since our product is an extension of the Component Content Management System Docato, we naturally use Docato itfself for writing our manual. This means the source code to the manual is not in Subversion with the rest of the code, and is not easily available to the build system. So what to do?

Well, Docato is pretty versatile. One can have a publication’s output sent to a well-known location on a server, for example. From there, we can download it using Ant’ scp task:

<scp file="${remote.manual.html.files}"
    todir="${local.manual.html.dir}"/>

where ${remote.manual.html.files} can use wildcards, like ${download.dir}/html/AMDS-CMS/*. But the download directory is on a different machine, so you need to provide the username and password for logging into that machine:

<property name="download.dir"
    value="${username}:${password}@${host}:${remote.dir}"/>

Of course, having a username and password in an Ant build file is a security risk. But in this case, the build file is not available outside our firewall, so we’re good. Otherwise, you’d have to provide the user credentials on the command line when running the Ant target:

ant create-dist -Dusername=foo -Dpassword=bar

scp also allows you to rename a file when downloading:

<scp file="${remote.manual.pdf.file}"
    localtofile="${local.manual.pdf.file}"/>

So now we can download the manual and incorporate it in our distribution. But how do we know we have the latest version of the manual?

Again, Docato comes to the rescue. It has the concept of scheduled tasks, actions that automatically run in the background from time to time. These are ideal for making backups, for instance.

So I created a scheduled task that builds a publication, and installed the task code on our manual server. Now every time a tech writer edits something, the edit will be automatically published at most a day later.

And so my journey ends.

But every end is a new beginning. Now that our distribution can be built by running a single Ant target, a whole new world opens up to us. My plan is to create a distribution automatically as part of our CruiseControl build. And then install it automatically, and run some tests against the installed version. Also, the distribution could be made available on some well-known server, so that interested people could always use the latest version for giving demos, for instance. But only when all the tests pass, of course 😉