REST Messages And Data Transfer Objects

In Patterns of Enterprise Application Architecture, Martin Fowler defines a Data Transfer Object (DTO) as

An object that carries data between processes in order to reduce the number of method calls.

Note that a Data Transfer Object is not the same as a Data Access Object (DAO), although they have some similarities. A Data Access Object is used to hide details from the underlying persistence layer.

REST Messages Are Serialized DTOs

message-transferIn a RESTful architecture, the messages sent across the wire are serializations of DTOs.

This means all the best practices around DTOs are important to follow when building RESTful systems.

For instance, Fowler writes

…encapsulate the serialization mechanism for transferring data over the wire. By encapsulating the serialization like this, the DTOs keep this logic out of the rest of the code and also provide a clear point to change serialization should you wish.

In other words, you should follow the DRY principle and have exactly one place where you convert your internal DTO to a message that is sent over the wire.

In JAX-RS, that one place should be in an entity provider. In Spring, the mechanism to use is the message converter. Note that both frameworks have support for several often-used serialization formats.

Following this advice not only makes it easier to change media types (e.g. from plain JSON or HAL to a more mature media type like Siren, Mason, or UBER). It also makes it easy to support multiple media types.

mediaThis in turn enables you to switch media types without breaking clients.

You can continue to serve old clients with the old media type, while new clients can take advantage of the new media type.

Introducing new media types is one way to evolve your REST API when you must make backwards incompatible changes.

DTOs Are Not Domain Objects

Domain objects implement the ubiquitous language used by subject matter experts and thus are discovered. DTOs, on the other hand, are designed to meet certain non-functional characteristics, like performance, and are subject to trade-offs.

This means the two have very different reasons to change and, following the Single Responsibility Principle, should be separate objects. Blindly serializing domain objects should thus be considered an anti-pattern.

That doesn’t mean you must blindly add DTOs, either. It’s perfectly fine to start with exposing domain objects, e.g. using Spring Data REST, and introducing DTOs as needed. As always, premature optimization is the root of all evil, and you should decide based on measurements.

The point is to keep the difference in mind. Don’t change your domain objects to get better performance, but rather introduce DTOs.

DTOs Have No Behavior

big-dataA DTO should not have any behavior; it’s purpose in life is to transfer data between remote systems.

This is very different from domain objects.

There are two basic approaches for dealing with the data in a DTO.

The first is to make them immutable objects, where all the input is provided in the constructor and the data can only be read.

This doesn’t work well for large objects, and doesn’t play nice with serialization frameworks.

The better approach is to make all the properties writable. Since a DTO must not have logic, this is one of the few occasions where you can safely make the fields public and omit the getters and setters.

Of course, that means some other part of the code is responsible for filling the DTO with combinations of properties that together make sense.

Conversely, you should validate DTOs that come back in from the client.

Communicate Through Stories Rather Than Tasks

cooperationLast time I talked about interfaces between pieces of code.

Today I want to discuss the interface between groups of people involved in developing software.

There are two basic groups: those who develop the software, and those who coordinate that development.

In Agile terms, those groups are the Development Team on the one hand, and the Product Owner and other Stakeholders on the other.

Speaking the Same Language

The two groups need to communicate, so they do best when everybody speaks the same language.

This begins with speaking the same “natural” language, e.g. English. For most teams that will be a given, but teams that are distributed over multiple locations in different countries need to be a bit careful.

tower-of-babelOnce the language is determined, the team should look at the jargon they will be using.

Since the Development Team needs to understand what they must build, they need to know the business terms.

The Product Owner and Stakeholders don’t necessarily need to understand the technical terms, however.

Therefore, it makes sense that the Ubiquitous Language is the language of the business.

Speaking About Work: Stories and Tasks

But the two groups need to talk about more than the business problem to be solved. For any non-trivial amount of work, they also need to talk about how to organize that work.

In most Agile methods, work is organized into Sprints or Iterations. These time-boxed periods of development are an explicit interface between Product Owner and Development Team.

user-storyThe Product Owner is the one steering the Development Team: she decides which User Stories will be built in a given Iteration.

The Development Team implements the requested Stories during the Iteration. They do this by breaking the Stories down into Tasks, having people sign up for the Tasks, and implementing them.

Tasks describe how development time is organized, whereas Stories describe functionality. So Tasks may refer to technical terms like relational databases, while Stories should only talk about functionality, like data persistence.

Stories Are the Interface

Since we value working software, we talk about Stories most of the time. Tasks only exist to make implementing Stories easier. They are internal to the Development Team, not part of the interface the Development Team shares with the Product Owner.

task-boardMany Development Teams do, in fact, expose Tasks to their Product Owners and other Stakeholders.

Sometimes they do this to explain why an Estimate for a Story is higher than the Product Owner expected.

Or they let the Product Owner attend Standup Meetings where Tasks are discussed.

This is fine, as long as both sides understand that Tasks are owned by the Development Team, just as Stories are owned by the Product Owner.

The Development Team may propose Stories, but the Product Owner decides what gets added to the Backlog and what gets scheduled in the Iteration.

Similarly, the Product Owner may propose, question, or inquire about Tasks, but the Development Team decides which Tasks make up a Story and in which order and by who they are implemented.

Always Honor the Interface

This well-defined interface between Product Owner and Development Team allows both sides to do their job well.

burn-up-chartIt’s important to understand that this has implications for how the software development process is organized.

For instance, the metrics we report up should be defined in terms of Stories, not Tasks.

Outside the Development Team, people shouldn’t care about how development time was divided, only about what the result was.

If we stick to the interface, both sides become decoupled and therefore free to innovate and optimize their own processes without jeopardizing the whole.

This is the primary benefit of any well-defined interface and the basis for a successful divide-and-conquer strategy.

What Do You Think?

feedbackWhat problems have you seen in the communication between the two groups?

Are you consciously restricting the communication to stories, or are you letting tasks slip in?

Please leave a comment.

How To Implement Input Validation For REST resources

rest-validationThe SaaS platform I’m working on has a RESTful interface that accepts XML payloads.

Implementing REST Resources

For a Java shop like us, it makes sense to use JAX-B to generate JavaBean classes from an XML Schema.

Working with XML (and JSON) payloads using JAX-B is very easy in a JAX-RS environment like Jersey:

@Path("orders")
public class OrdersResource {
  @POST
  @Consumes({ "application/xml", "application/json" })
  public void place(Order order) {
    // Jersey marshalls the XML payload into the Order 
    // JavaBean, allowing us to write type-safe code 
    // using Order's getters and setters.
    int quantity = order.getQuantity();
    // ...
  }
}

(Note that you shouldn’t use these generic media types, but that’s a discussion for another day.)

The remainder of this post assumes JAX-B, but its main point is valid for other technologies as well. Whatever you do, please don’t use XMLDecoder, since that is open to a host of vulnerabilities.

Securing REST Resources

Let’s suppose the order’s quantity is used for billing, and we want to prevent people from stealing our money by entering a negative amount.

We can do that with input validation, one of the most important tools in the AppSec toolkit. Let’s look at some ways to implement it.

Input Validation With XML Schema

xml-schemaWe could rely on XML Schema for validation, but XML Schema can only validate so much.

Validating individual properties will probably work fine, but things get hairy when we want to validate relations between properties. For maximum flexibility, we’d like to use Java to express constraints.

More importantly, schema validation is generally not a good idea in a REST service.

A major goal of REST is to decouple client and server so that they can evolve separately.

If we validate against a schema, then a new client that sends a new property would break against an old server that doesn’t understand the new property. It’s usually better to silently ignore properties you don’t understand.

JAX-B does this right, and also the other way around: properties that are not sent by an old client end up as null. Consequently, the new server must be careful to handle null values properly.

Input Validation With Bean Validation

bean-validationIf we can’t use schema validation, then what about using JSR 303 Bean Validation?

Jersey supports Bean Validation by adding the jersey-bean-validation jar to your classpath.

There is an unofficial Maven plugin to add Bean Validation annotations to the JAX-B generated classes, but I’d rather use something better supported and that works with Gradle.

So let’s turn things around. We’ll handcraft our JavaBean and generate the XML Schema from the bean for documentation:

@XmlRootElement(name = "order")
public class Order {
  @XmlElement
  @Min(1)
  public int quantity;
}
@Path("orders")
public class OrdersResource {
  @POST
  @Consumes({ "application/xml", "application/json" })
  public void place(@Valid Order order) {
    // Jersey recognizes the @Valid annotation and
    // returns 400 when the JavaBean is not valid
  }
}

Any attempt to POST an order with a non-positive quantity will now give a 400 Bad Request status.

Now suppose we want to allow clients to change their pending orders. We’d use PATCH or PUT to update individual order properties, like quantity:

@Path("orders")
public class OrdersResource {
  @Path("{id}")
  @PUT
  @Consumes("application/x-www-form-urlencoded")
  public Order update(@PathParam("id") String id, 
      @Min(1) @FormParam("quantity") int quantity) {
    // ...
  }
}

We need to add the @Min annotation here too, which is duplication. To make this DRY, we can turn quantity into a class that is responsible for validation:

@Path("orders")
public class OrdersResource {
  @Path("{id}")
  @PUT
  @Consumes("application/x-www-form-urlencoded")
  public Order update(@PathParam("id") String id, 
      @FormParam("quantity")
      Quantity quantity) {
    // ...
  }
}
@XmlRootElement(name = "order")
public class Order {
  @XmlElement
  public Quantity quantity;
}
public class Quantity {
  private int value;

  public Quantity() { }

  public Quantity(String value) {
    try {
      setValue(Integer.parseInt(value));
    } catch (ValidationException e) {
      throw new IllegalArgumentException(e);
    }
  }

  public int getValue() {
    return value;
  }

  @XmlValue
  public void setValue(int value) 
      throws ValidationException {
    if (value < 1) {
      throw new ValidationException(
          "Quantity value must be positive, but is: " 
          + value);
    }
    this.value = value;
  }
}

We need a public no-arg constructor for JAX-B to be able to unmarshall the payload into a JavaBean and another constructor that takes a String for the @FormParam to work.

setValue() throws javax.xml.bind.ValidationException so that JAX-B will stop unmarshalling. However, Jersey returns a 500 Internal Server Error when it sees an exception.

We can fix that by mapping validation exceptions onto 400 status codes using an exception mapper. While we’re at it, let’s do the same for IllegalArgumentException:

@Provider
public class DefaultExceptionMapper 
    implements ExceptionMapper<Throwable> {

  @Override
  public Response toResponse(Throwable exception) {
    Throwable badRequestException 
        = getBadRequestException(exception);
    if (badRequestException != null) {
      return Response.status(Status.BAD_REQUEST)
          .entity(badRequestException.getMessage())
          .build();
    }
    if (exception instanceof WebApplicationException) {
      return ((WebApplicationException)exception)
          .getResponse();
    }
    return Response.serverError()
        .entity(exception.getMessage())
        .build();
  }

  private Throwable getBadRequestException(
      Throwable exception) {
    if (exception instanceof ValidationException) {
      return exception;
    }
    Throwable cause = exception.getCause();
    if (cause != null && cause != exception) {
      Throwable result = getBadRequestException(cause);
      if (result != null) {
        return result;
      }
    }
    if (exception instanceof IllegalArgumentException) {
      return exception;
    }
    if (exception instanceof BadRequestException) {
      return exception;
    }
    return null;
  }

}

Input Validation By Domain Objects

dddEven though the approach outlined above will work quite well for many applications, it is fundamentally flawed.

At first sight, proponents of Domain-Driven Design (DDD) might like the idea of creating the Quantity class.

But the Order and Quantity classes do not model domain concepts; they model REST representations. This distinction may be subtle, but it is important.

DDD deals with domain concepts, while REST deals with representations of those concepts. Domain concepts are discovered, but representations are designed and are subject to all kinds of trade-offs.

For instance, a collection REST resource may use paging to prevent sending too much data over the wire. Another REST resource may combine several domain concepts to make the client-server protocol less chatty.

A REST resource may even have no corresponding domain concept at all. For example, a POST may return 202 Accepted and point to a REST resource that represents the progress of an asynchronous transaction.

ubiquitous-languageDomain objects need to capture the ubiquitous language as closely as possible, and must be free from trade-offs to make the functionality work.

When designing REST resources, on the other hand, one needs to make trade-offs to meet non-functional requirements like performance, scalability, and evolvability.

That’s why I don’t think an approach like RESTful Objects will work. (For similar reasons, I don’t believe in Naked Objects for the UI.)

Adding validation to the JavaBeans that are our resource representations means that those beans now have two reasons to change, which is a clear violation of the Single Responsibility Principle.

We get a much cleaner architecture when we use JAX-B JavaBeans only for our REST representations and create separate domain objects that handle validation.

Putting validation in domain objects is what Dan Bergh Johnsson refers to as Domain-Driven Security.

cave-artIn this approach, primitive types are replaced with value objects. (Some people even argue against using any Strings at all.)

At first it may seem overkill to create a whole new class to hold a single integer, but I urge you to give it a try. You may find that getting rid of primitive obsession provides value even beyond validation.

What do you think?

How do you handle input validation in your RESTful services? What do you think of Domain-Driven Security? Please leave a comment.

Building Both Security and Quality In

One of the important things in a Security Development Lifecycle (SDL) is to feed back information about vulnerabilities to developers.

This post relates that practice to the Agile practice of No Bugs.

The Security Incident Response

Even though we work hard to ship our software without security vulnerabilities, we never succeed 100%.

When an incident is reported (hopefully responsibly), we execute our security response plan. We must be careful to fix the issue without introducing new problems.

Next, we should also look for similar issues to the one reported. It’s not unlikely that there are issues in other parts of the application that are similar to the reported one. We should find and fix those as part of the same security update.

Finally, we should do a root cause analysis to determine why this weakness slipped through the cracks in the first place. Armed with that knowledge, we can adapt our process to make sure that similar issues will not occur in the future.

From Security To Quality

The process outlined above works well for making our software ever more secure.

But security weaknesses are essentially just bugs. Security issues may have more severe consequences than regular bugs, but most regular bugs are expensive to fix once the software is deployed as well.

So it actually makes sense to treat all bugs, security or otherwise, the same way.

As the saying goes, an ounce of prevention is worth a pound of cure. Just as we need to build security in, we also need to build quality in general in.

Building Quality In Using Agile Methods

This has been known in the Agile and Lean communities for a long time. For instance, James Shore wrote about it in his excellent book The Art Of Agile Development and Elisabeth Hendrickson thinks that there should be so little bugs that they don’t need triaging.

Some people object to the Zero Defects mentality, claiming that it’s unrealistic.

There is, however, clear evidence of much lower defect rates for Agile development teams. Many Lean implementations also report successes in their quest for Zero Defects.

So there is at least anecdotal evidence that a very significant reduction of defects is possible.

This will require change, of course. Testers need to change and so do developers. And then everybody on the team needs to speak the same language and work together as a single team instead of in silos.

If we do this well, we’ll become bug exterminators that delight our customers with software that actually works.

The Nature of Software Development

I’ve been developing software for a while now. Over fifteen years professionally, and quite some time before that too. One would think that when you’ve been doing something for a long time, you would develop a good understanding of what it is that you do. Well, maybe. But every once in a while I come across something that deepens my insight, that takes it to the next level.

I recently came across an article (from 1985!) that does just that, by providing a theory of what software development really is. If you have anything to do with developing software, please read the article. It’s a bit dry and terse at times, but please persevere.

The article promotes the view that, in essence, software development is a theory building activity. This means that it is first and foremost about building a mental model in the developer’s mind about how the world works and how the software being developed handles and supports that world.

This contrasts with the more dominant manufacturing view that sees software development as an activity where some artifacts are to be produced, like code, tests, and documentation. That’s not to say that these artifacts are not produced, since obviously they are, but rather that that’s not the real issue. If we want to be able to develop software that works well, and is maintainable, we’re better off focusing on building the right theories. The right artifacts will then follow.

This view has huge implications for how we organize software development. Below, I will discuss some things that we need to do differently from what you might expect based on the manufacturing view. The funny thing is that we’ve known that for some time now, but for different reasons.

We need strong customer collaboration to build good theories
Mental models can never be externalized completely, some subtleties always get lost when we try that. That’s why requirements as documents don’t work. It’s not just that they will change (Agile), or that hand-offs are a waste (Lean), no, they fundamentally cannot work! So we need the customer around to clarify the subtle details, and we need to go see how the end users work in their own environment to build the best possible theories.

Since this is expensive, and since a customer would not like having to explain the same concept over and over again to different developers, it makes sense to try and capture some of the domain knowledge, even though we know we can never capture every subtle detail. This is where automated acceptance tests, for example as produced in Behavior Driven Development (BDD), can come in handy.

Software developers should share the same theory
It’s of paramount importance that all the developers on the team share the same theory, or else they will develop things that make no sense to their team members and that will not integrate well with their team members’ work. Having the same theory is helped by using the same terminology. Developers should be careful not to let the meaning of the terms drift away from how the end users use them.

The need for a shared theory means that strong code ownership is a bad idea. Weak ownership could work, but we’d better take measures to prevent silos from forming. Code reviews are one candidate for this. The best approach, however, is collective code ownership, especially when combined with pair programming.

Note that eXtreme Programming (XP) makes the shared theory explicit through it’s system metaphor concept. This is the XP practice that I’ve always struggled most with, but now it’s finally starting to make sense to me.

Theory building is an essential skill for software developers
If software development is first and foremost about building theories, then software developers better be good at it. So it makes sense to teach them this. I’m not yet aware of how to best do this, but one obvious place to start is the scientific method, since building theories is precisely what scientist do (even though some dispute the scientific method is actually the way scientists build theories). This reminds me of the relationship between the scientific method and Test-Driven Development (TDD).

Another promising approach is Domain Driven Design, since that places the domain model at the center of software development. Please let me know if you have more ideas on this subject.

Software developers are not interchangeable resources
If the most crucial part of software development is the building of a theory, than you can’t just simply replace one developer with another, since the new girl needs to start building her theory from scratch and that takes time. This is another reason why you can’t add people to a late project without making it later still, as (I wish!) we all know.

Also, developers with a better initial theory are better suited to work on a new project, since they require less theory building. That’s why it makes sense to use developers with pre-existing domain knowledge, if possible. Conversely, that’s also why it makes sense for a developer to specialize in a given domain.

We should expect designs to undergo significant changes
Taking a hint from science, we see that theories sometimes go through radical changes. Now, if the software design is a manifestation of the theory in the developers’ mind, then we can expect the design to undergo radical changes as well. This argues against too much design up front, and for incremental design.

Maintenance should be done by the original developers
Some organizations hand off software to a maintenance team once its initial development is done. From the perspective of software development as theory building, that doesn’t make a whole lot of sense. The maintenance team doesn’t have the theories that the original developers built, and will likely make modifications that don’t fit that theory and are therefore not optimal.

If you do insist on separate maintenance teams, then the maintainers should be phased into the team before the initial stage ends, so they have access to people that already have well formed theories and can learn from them.

Software developers should not be shared between teams
For productivity reasons, software developers shouldn’t divide their attention between multiple projects. But the theory building view of software development gives another perspective. It’s hard enough to build one theory at a time, especially if one is also learning other stuff, like a new technology. Developers really shouldn’t need to learn too much in parallel, or they may feel like their head is going to explode.