Writing Maintainable and Secure Java Applications using an XQuery Builder

So you’re developing this cool Java application where you access XML data using XQuery. Easy enough with a powerful XML database like xDB, right? Well, yes and no šŸ˜‰ This document addresses some of the issues you may encounter.

The naive approach

The easiest way to execute XQuery statements, is to embed them into your Java code:

executeXQuery("for $a in document('/content/repository)"
    + " where $a//html/head/title = 'Using XQuery'"
    + " return $a");

where executeXQuery() executes the XQuery against your XML database.

Most of your XQuery statements won’t be static like this example. Rather, you’d get some input from your end user:

    final String title = getInputFromEndUser();
    final String xquery
        = "for $a in document('/content/repository)"
        + " where $a//html/head/title = '"
        + title
        + "' return $a";
    executeQuery(xquery);

Problems with the naive approach

This approach has some problems, though. First of all, the last XQuery is vulnerable to an XQuery Injection attack. This is the same as a SQL Injection attack, but based on XQuery instead of SQL. Like with SQL programming, you can use variables to work around this issue:

final String title = getInputFromEndUser();
final String xquery
    = "declare variable $title external;"
    + "for $a in document('/content/repository)"
    + " where $a//html/head/title = $title"
    + " return $a";
executeQuery(xquery, title);

where executeXQuery() now accepts a variable number of arguments after the XQuery statement that are values for the externally declared variables.

But there are still some maintainability issues with this code. For starters, see the argument to the document() function. This depends on the particular database layout for your application. If you’ll ever need to change it, you’ll likely need to update hundreds of XQuery statements. You could, of course, extract this into a constant.

But there is more. Your XQueries are likely to go beyond the basic XQuery specification, for instance to search on meta-data. In xDB, that would read something like this:

final String title = getInputFromEndUser();
final String xquery
    = "declare variable $title external;"
    + "for $a in document('/content/repository)"
    + " where xhive:metadata($a, 'Title') = $title"
    + " return $a";
executeQuery(xquery, title);

You’ve now added a dependency on a specific implementation, which is never a good idea, since it basically generates a self-inflicted vendor lock-in.

Of course, you could extract the vendor-specific parts as well, but by now I hope you begin to see the mess you’ll end up with.

Worse, since you embed the XQuery statement as a String in your Java code, any typos you make in this unreadable statement can only be found at runtime, since the Java compiler doesn’t understand XQuery.

XQuery Builder to the rescue

Let’s take a step back here and look at what we’re trying to achieve. We want to construct an object (an XQuery statement), that we want to use later on (execute it against our XML database). This is a recurring pattern, called the Builder Pattern. So we need an XQuery Builder.

Now, the XQuery standard is complex enough that I don’t recommend spending a lot of time coming up with the perfect XQuery Builder. Instead, you should take it slow, and only implement what you really need.

The best way to do that is using Test-Driven Development (TDD). I like to think that’s always the case, but even if you disagree, there are good reasons why it is the best approach in this scenario.

You’ll evolve the XQuery Builder over time, adding capabilities as needed, so you need a good suite of unit tests to ensure you didn’t break anything. Also, TDD focuses first and foremost on the API that you want to realize, making it easier to come up with a clean design.

Speaking of a clean design, the Builder Pattern lends itself very much to the use of a fluent interface, since you want to be able to express the XQuery in code as much as possible as you would in a string. Here’s an example of the sort of thing we’re trying to achieve:

    final String xquery = builder
        .where().metaData("Title").isEqualTo(title)
        .and().uri().startsWith(prefix)
        .orderBy().uri()
        .returns().id()
        .build();

Let’s take a look at how the XQuery Builder approach solves the problems we identified earlier.

First the security issue. The example above doesn’t explicitly mention external variables, but that doesn’t mean that they aren’t used. If your code needs security, the XQuery Builder can provide it. If you’re absolutely sure that your application only runs in a trusted environment, you can leave it out. If you later discover that your environment isn’t as secure as you thought, you can add support for external variables in the XQuery Builder and be done with it. No need to change hundreds of XQuery statements!

Next, notice that the example didn’t mention where to look for documents. The XQuery Builder is the only place where the repository layout is specified, so that it is easy to update.

There is also nothing vendor specific in the example above. The metaData() clause handles that, again in one place.

Arguably the biggest benefit of the XQuery Builder, however, is that it gives you (some) compile time checking of your XQuery statements. For example, if you were to write builder.hwere(), the Java compiler would tell you about it right away.

You can take this as far as you think is useful. For instance, notice the uri() method in the example. Apparently, this application uses URIs on objects a lot, so it made sense to make it easy to use them. The same apparently didn’t hold for the Title meta-data field. By developing your own XQuery Builder, you get to decide the API that makes sense for your application.

Creating an XQuery Builder

So, how hard is it to create such an XQuery Builder? That depends on how far you want to go. But the beginnings are simple.

Start out with this JUnit 4 test:

import static org.junit.Assert.assertEquals;

import org.junit.Before;
import org.junit.Test;


public class XQueryBuilderTest {

  private XQueryBuilder builder;

  @Before
  public void init() {
    builder = new XQueryBuilder();
  }

  @Test
  public void all() {
    assertEquals("XQuery",
        "for $a in document('/content/repository')\n"
            + "return $a",
        builder.build());
  }

}

which forces us to write this code to make it compile:

public class XQueryBuilder {

  public String build() {
    return null;
  }

}

The test obviously fails. For now just fake it by returning "for $a in document('/content/repository')\nreturn $a".

This first step may seem a bit silly to those not used to TDD, but it is essentially just a way to get set up. In TDD, you don’t want to write code without a failing test, so always try to get a failing test as fast as possible.

Now, for something a bit more interesting. Let’s test that the XQuery can return IDs of documents, since we’ll need that very often:

@Test
public void returnId() {
  assertEquals("XQuery", 
      "for $a in document('/content/repository')\n"
          + "return xhive:metadata($a,'id')", 
      builder.returns().id().build());
}

In fact, that’s a special case of returning some meta-data, so we’ll tackle the simpler case first:

@Test
public void returnMetaData() {
  assertEquals("XQuery", 
      "for $a in document('/content/repository')\n"
          + "return xhive:metadata($a,'foo')", 
      builder.returns().metaData("foo").build());
}

For this to compile, we need a returns() method in XQueryBuilder:

public class XQueryBuilder {

  private final Return returns = new Return(this);

  public String build() {
    final StringBuilder result = new StringBuilder();
    result.append(
        "for $a in document('/content/repository')\n");
    result.append(returns);
    return result.toString();
  }

  public Return returns() {
    return returns;
  }

}

Note that we can’t use the more natural term return, since that is a reserved word in Java. Here’s the Return class:

public class Return {

  private final XQueryBuilder builder;
  private MetaDataReturnClause clause;

  public Return(final XQueryBuilder builder) {
    this.builder = builder;
  }

  public Return metaData(final String name) {
    return setClause(new MetaDataReturnClause(name));
  }

  private Return setClause(
      final MetaDataReturnClause clause) {
    this.clause = clause;
    return this;
  }

  @Override
  public String toString() {
    final StringBuilder result = new StringBuilder(
        "return ");
    if (clause == null) {
      result.append("$a");
    } else {
      result.append(clause);
    }
    return result.toString();
  }

  public String build() {
    return builder.build();
  }

}

And here’s the MetaDataReturnClause:

public class MetaDataReturnClause {

  private final String name;

  public MetaDataReturnClause(final String name) {
    this.name = name;
  }

  @Override
  public String toString() {
    return "xhive:metadata($a,'" + name + "')";
  }

}

So implementing the ID is easy:

public class Return {

  public Return id() {
    return setClause(new IdReturnClause());
  }

  // ...
}
public class IdReturnClause 
    extends MetaDataReturnClause {

  public IdReturnClause() {
    super("id");
  }

}

By now you probably spotted some duplication. First the tests:

  @Test
  public void all() {
    assertXQuery("return $a", builder.build());
  }

  @Test
  public void returnMetaData() {
    assertXQuery("return xhive:metadata($a,'foo')",
        builder.returns().metaData("foo").build());
  }

  @Test
  public void returnId() {
    assertXQuery("return xhive:metadata($a,'id')",
        builder.returns().id().build());
  }

  private void assertXQuery(final String expected, 
      final String actual) {
    assertEquals("XQuery", 
        "for $a in document('/content/repository')\n" 
        + expected, actual);
  }

Yes, it’s just as important to keep your tests clean as it is for your code! Speaking of which, there are a lot of places where this $a thingie comes up. Let’s extract it:

public class XQueryBuilder {

  public String build() {
    final StringBuilder result = new StringBuilder();
    result.append("for ").append(getContext())
       .append(" in document('/content/repository')\n");
    result.append(returns);
    return result.toString();
  }

  public String getContext() {
    return "$a";
  }

  // ...
}

So that the Return class can use it:

public class Return {

  private final XQueryBuilder builder;
  private MetaDataReturnClause clause;

  public Return(final XQueryBuilder builder) {
    this.builder = builder;
  }

  public Return metaData(final String name) {
    return setClause(new MetaDataReturnClause(this, 
        name));
  }

  public Return id() {
    return setClause(new IdReturnClause(this));
  }

  private Return setClause(
      final MetaDataReturnClause clause) {
    this.clause = clause;
    return this;
  }

  @Override
  public String toString() {
    final StringBuilder result = new StringBuilder();
    result.append("return ");
    if (clause == null) {
      result.append(builder.getContext());
    } else {
      result.append(clause);
    }
    return result.toString();
  }

  public String build() {
    return builder.build();
  }

  public XQueryBuilder getBuilder() {
    return builder;
  }

}

And the MetaDataReturnClause as well:

public class MetaDataReturnClause {

  private final String name;
  private final Return returns;
  private XQueryBuilder builder;

  public MetaDataReturnClause(final Return returns, 
      final String name) {
    this.returns = returns;
    this.name = name;
  }

  @Override
  public String toString() {
    return "xhive:metadata(" 
        + returns.getBuilder().getContext() 
        + ",'" + name + "')";
  }

}

You can probably see the getContext() method gaining traction when considering recursive XQueries. As always, keeping your design clean makes it easier to enhance later.

So there you have your basic XQuery Builder. From these humble beginnings, it’s easy to add more functionality. For example, suppose we want to return not just the ID, but also the URI of an object. First we add support for URIs in the return clause, since we anticipate we’ll it need often:

  @Test
  public void returnUri() {
    assertXQuery("return xhive:metadata($a,'uri')",
        builder.returns().uri().build());
  }

Which is implemented along the same lines as before:

public class Return {

  public Return uri() {
    return setClause(new UriReturnClause(this));
  }

  // ...

}

With a new class UriReturnClause:

public class UriReturnClause
    extends MetaDataReturnClause {

  public UriReturnClause(final Return returns) {
    super(returns, "uri");
  }

}

Next, we need to be able to return multiple items:

  @Test
  public void returnIdAndUri() {
    assertXQuery("return (xhive:metadata($a,'id'), "
        + "xhive:metadata($a,'uri'))",
        builder.returns().id().and().uri().build());
  }

The and() method is just syntactic sugar to make the code easy to read:

  public Return and() {
    return this;
  }

To pass the test, we need to change the clause instance variable to a list:

public class Return {

  private final List clauses = new ArrayList();

  public Return metaData(final String name) {
    return addClause(new MetaDataReturnClause(this,
        name));
  }

  private Return addClause(
      final MetaDataReturnClause clause) {
    clauses.add(clause);
    return this;
  }

  @Override
  public String toString() {
    final StringBuilder result = new StringBuilder();
    result.append("return ");
    if (clauses.isEmpty()) {
      result.append(builder.getContext());
    } else {
      if (clauses.size() > 1) {
        result.append('(');
      }
      String prefix = "";
      for (final MetaDataReturnClause clause : clauses) {
        result.append(prefix).append(clause);
        prefix = ", ";
      }
      if (clauses.size() > 1) {
        result.append(')');
      }
    }
    return result.toString();
  }

  // ...

}

Adding support for the where and orderBy clauses follows the same approach as for return and is left as an exercise for the reader šŸ˜‰

In doing so, you will probably encounter some duplication for e.g. meta-data handling between the where, orderBy and return clauses, which you can extract into e.g. XQueryBuilder.getMetaDataClause().

Have fun writing your XQueryBuilder based Java applications!

Advertisements

2 thoughts on “Writing Maintainable and Secure Java Applications using an XQuery Builder

  1. It seems like some snippets are missing. Can you send me the source code of this builder?

    Thanks
    Alim

Please Join the Discussion

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s