Supporting multiple versions of a data model

As an application evolves, its data model often does too. If you control both, this usually isn’t a problem. However, sometimes your power to change the data model is restricted. This happens, for instance, when the data model is published, and others may depend on it. An extreme case of this is when the data model is defined by another organization as, for example, with S1000D.

Having no absolute control over the data model isn’t much of a problem if you can leave one version behind completely, and move on to the next. But often you won’t be so lucky. I know I’m not: we need to support both S1000D 3.0 and 4.0.

There’s different ways in which you can support multiple data model versions. The one I’m concerned with here, is when your application needs to support multiple data models at the same time with the same code. That leaves out alternatives like having multiple branches of your code for the different data model versions.

One trick that can come to the rescue here is the Once And Only Once rule (also called the DRY principle). When applied to creating instances, this leads to the Factory pattern. If you have all your instances created by a factory, then there’s only one place where you need to decide which class (e.g. the 3.0 or 4.0 version) to instantiate. If those decisions are similar for all the classes in your model, then you could even extract them into a common base class for your factories.

Most of the time, the different versions of the data model will share a lot of similarities. It is tempting to extract those into a common base class. For example, in S1000D there is a type called descriptive data module, and you could derive DescriptiveDataModule30 and DescriptiveDataModule40 from DecriptiveDataModule.

But when the objects in your data model have inheritance relationships themselves, that can get ugly very fast. For instance, a descriptive data module is one of many kinds of data modules, and these data modules share a lot of characteristics. So in code, DescriptiveDataModule would descend from DataModule, and both would have aspects that differ in the 3.0 and 4.0 versions. This spells trouble.

Therefore, it is usually better to use composition instead. So DataModule would have a reference to a DataModuleIssue (where “issue” is used in the sense of the various issues of the S1000D specification, i.e. what I’ve been calling “versions” so far), which the DescriptiveDataModule would inherit. The factory would inject either a DescriptiveDataModuleIssue30 or a DescriptiveDataModuleIssue40 into the DescriptiveDataModule, where DescriptiveDataModuleIssue30 would descend from DataModuleIssue30, and DescriptiveDataModuleIssue40 from DataModuleIssue40.

The idea is to make the Issue classes very bare, dealing only with the stuff that differs between issues, so there is no need for a common base class (although both do implement the same interface). The things that are the same in all issues, go into the core model objects (DescriptiveDataModule and DataModule in our example).


Please Join the Discussion

Please log in using one of these methods to post your comment: Logo

You are commenting using your account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s