Supporting multiple versions of a data model

As an application evolves, its data model often does too. If you control both, this usually isn’t a problem. However, sometimes your power to change the data model is restricted. This happens, for instance, when the data model is published, and others may depend on it. An extreme case of this is when the data model is defined by another organization as, for example, with S1000D.

Having no absolute control over the data model isn’t much of a problem if you can leave one version behind completely, and move on to the next. But often you won’t be so lucky. I know I’m not: we need to support both S1000D 3.0 and 4.0.

There’s different ways in which you can support multiple data model versions. The one I’m concerned with here, is when your application needs to support multiple data models at the same time with the same code. That leaves out alternatives like having multiple branches of your code for the different data model versions.

One trick that can come to the rescue here is the Once And Only Once rule (also called the DRY principle). When applied to creating instances, this leads to the Factory pattern. If you have all your instances created by a factory, then there’s only one place where you need to decide which class (e.g. the 3.0 or 4.0 version) to instantiate. If those decisions are similar for all the classes in your model, then you could even extract them into a common base class for your factories.

Most of the time, the different versions of the data model will share a lot of similarities. It is tempting to extract those into a common base class. For example, in S1000D there is a type called descriptive data module, and you could derive DescriptiveDataModule30 and DescriptiveDataModule40 from DecriptiveDataModule.

But when the objects in your data model have inheritance relationships themselves, that can get ugly very fast. For instance, a descriptive data module is one of many kinds of data modules, and these data modules share a lot of characteristics. So in code, DescriptiveDataModule would descend from DataModule, and both would have aspects that differ in the 3.0 and 4.0 versions. This spells trouble.

Therefore, it is usually better to use composition instead. So DataModule would have a reference to a DataModuleIssue (where “issue” is used in the sense of the various issues of the S1000D specification, i.e. what I’ve been calling “versions” so far), which the DescriptiveDataModule would inherit. The factory would inject either a DescriptiveDataModuleIssue30 or a DescriptiveDataModuleIssue40 into the DescriptiveDataModule, where DescriptiveDataModuleIssue30 would descend from DataModuleIssue30, and DescriptiveDataModuleIssue40 from DataModuleIssue40.

The idea is to make the Issue classes very bare, dealing only with the stuff that differs between issues, so there is no need for a common base class (although both do implement the same interface). The things that are the same in all issues, go into the core model objects (DescriptiveDataModule and DataModule in our example).

Unit testing a user interface

So I’m on this new cool Google Web Toolkit (GWT) project of ours now. Part of the UI consists of a series of HTML labels that is the view to our model. Since our model has a tree structure, we use the visitor pattern to create this UI.

This all works beautifully, except when it doesn’t, i.e. when there is a bug. With all the recursion and the sometimes convoluted HTML that is required to achieve the desired effect, hunting down bugs isn’t always easy.

So it was about time to write some unit tests. Normally, I write the tests first. But I was added to this project, and there weren’t any tests yet. So I decided to introduce them. Now, it is often easier to start with tests than add them later. If you don’t start out by writing tests, you usually end up with code that is not easily testable. That was also the case here.

Making the code testable

So my first job was to make sure the code became testable. I started out by separating the creation of the HTML labels from the visitor code, since I didn’t want my tests to depend on GWT. So I introduced a simple TextStream:

public interface TextStream {
  void add(String text);
}

The visitor code is injected with this text stream and calls the add() method with some HTML markup. Normally, this TextStream is a HtmlLabelStream:

public class HtmlLabelStream implements TextStream {

  private final FlowPanel parent;

  public HtmlLabelStream(final FlowPanel parent) {
    this.parent = parent;
  }

  public void add(final String text) {
    parent.add(new HTML(text));
  }

}

But my test case also implements the TextStream interface, and it injects itself into the visitor. That way, it can collect the output from the visitor and compare it with the expected output.

Simplifying testing

Now I got a whole lot of HTML code that I needed to compare to the desired output. It was hard to find the deviations in this, since the required HTML is rather verbose.

So I decided to invest some time in transforming the incoming HTML into something more readable. For instance, to indent some piece of text, the HTML code contains something like this:

<div style="padding-left: 20px">...</div>

For each indentation level, 10px were used. So I decided to strip the div, and replace it with the number of spaces that corresponds to the indentation level. I repeated the same trick for a couple of other styles. In the end, the output became much easier to read, and thus much easier to spot errors in. In fact, it now looks remarkably similar to what is rendered by the browser.

This certainly cost me time to build. But it also won me time when comparing the actual and expected outputs. And adding a new test is now a breeze. Which is just the sort of incentive I need for my fellow team mates to also start writing tests…