How To Implement Input Validation For REST resources

rest-validationThe SaaS platform I’m working on has a RESTful interface that accepts XML payloads.

Implementing REST Resources

For a Java shop like us, it makes sense to use JAX-B to generate JavaBean classes from an XML Schema.

Working with XML (and JSON) payloads using JAX-B is very easy in a JAX-RS environment like Jersey:

@Path("orders")
public class OrdersResource {
  @POST
  @Consumes({ "application/xml", "application/json" })
  public void place(Order order) {
    // Jersey marshalls the XML payload into the Order 
    // JavaBean, allowing us to write type-safe code 
    // using Order's getters and setters.
    int quantity = order.getQuantity();
    // ...
  }
}

(Note that you shouldn’t use these generic media types, but that’s a discussion for another day.)

The remainder of this post assumes JAX-B, but its main point is valid for other technologies as well. Whatever you do, please don’t use XMLDecoder, since that is open to a host of vulnerabilities.

Securing REST Resources

Let’s suppose the order’s quantity is used for billing, and we want to prevent people from stealing our money by entering a negative amount.

We can do that with input validation, one of the most important tools in the AppSec toolkit. Let’s look at some ways to implement it.

Input Validation With XML Schema

xml-schemaWe could rely on XML Schema for validation, but XML Schema can only validate so much.

Validating individual properties will probably work fine, but things get hairy when we want to validate relations between properties. For maximum flexibility, we’d like to use Java to express constraints.

More importantly, schema validation is generally not a good idea in a REST service.

A major goal of REST is to decouple client and server so that they can evolve separately.

If we validate against a schema, then a new client that sends a new property would break against an old server that doesn’t understand the new property. It’s usually better to silently ignore properties you don’t understand.

JAX-B does this right, and also the other way around: properties that are not sent by an old client end up as null. Consequently, the new server must be careful to handle null values properly.

Input Validation With Bean Validation

bean-validationIf we can’t use schema validation, then what about using JSR 303 Bean Validation?

Jersey supports Bean Validation by adding the jersey-bean-validation jar to your classpath.

There is an unofficial Maven plugin to add Bean Validation annotations to the JAX-B generated classes, but I’d rather use something better supported and that works with Gradle.

So let’s turn things around. We’ll handcraft our JavaBean and generate the XML Schema from the bean for documentation:

@XmlRootElement(name = "order")
public class Order {
  @XmlElement
  @Min(1)
  public int quantity;
}
@Path("orders")
public class OrdersResource {
  @POST
  @Consumes({ "application/xml", "application/json" })
  public void place(@Valid Order order) {
    // Jersey recognizes the @Valid annotation and
    // returns 400 when the JavaBean is not valid
  }
}

Any attempt to POST an order with a non-positive quantity will now give a 400 Bad Request status.

Now suppose we want to allow clients to change their pending orders. We’d use PATCH or PUT to update individual order properties, like quantity:

@Path("orders")
public class OrdersResource {
  @Path("{id}")
  @PUT
  @Consumes("application/x-www-form-urlencoded")
  public Order update(@PathParam("id") String id, 
      @Min(1) @FormParam("quantity") int quantity) {
    // ...
  }
}

We need to add the @Min annotation here too, which is duplication. To make this DRY, we can turn quantity into a class that is responsible for validation:

@Path("orders")
public class OrdersResource {
  @Path("{id}")
  @PUT
  @Consumes("application/x-www-form-urlencoded")
  public Order update(@PathParam("id") String id, 
      @FormParam("quantity")
      Quantity quantity) {
    // ...
  }
}
@XmlRootElement(name = "order")
public class Order {
  @XmlElement
  public Quantity quantity;
}
public class Quantity {
  private int value;

  public Quantity() { }

  public Quantity(String value) {
    try {
      setValue(Integer.parseInt(value));
    } catch (ValidationException e) {
      throw new IllegalArgumentException(e);
    }
  }

  public int getValue() {
    return value;
  }

  @XmlValue
  public void setValue(int value) 
      throws ValidationException {
    if (value < 1) {
      throw new ValidationException(
          "Quantity value must be positive, but is: " 
          + value);
    }
    this.value = value;
  }
}

We need a public no-arg constructor for JAX-B to be able to unmarshall the payload into a JavaBean and another constructor that takes a String for the @FormParam to work.

setValue() throws javax.xml.bind.ValidationException so that JAX-B will stop unmarshalling. However, Jersey returns a 500 Internal Server Error when it sees an exception.

We can fix that by mapping validation exceptions onto 400 status codes using an exception mapper. While we’re at it, let’s do the same for IllegalArgumentException:

@Provider
public class DefaultExceptionMapper 
    implements ExceptionMapper<Throwable> {

  @Override
  public Response toResponse(Throwable exception) {
    Throwable badRequestException 
        = getBadRequestException(exception);
    if (badRequestException != null) {
      return Response.status(Status.BAD_REQUEST)
          .entity(badRequestException.getMessage())
          .build();
    }
    if (exception instanceof WebApplicationException) {
      return ((WebApplicationException)exception)
          .getResponse();
    }
    return Response.serverError()
        .entity(exception.getMessage())
        .build();
  }

  private Throwable getBadRequestException(
      Throwable exception) {
    if (exception instanceof ValidationException) {
      return exception;
    }
    Throwable cause = exception.getCause();
    if (cause != null && cause != exception) {
      Throwable result = getBadRequestException(cause);
      if (result != null) {
        return result;
      }
    }
    if (exception instanceof IllegalArgumentException) {
      return exception;
    }
    if (exception instanceof BadRequestException) {
      return exception;
    }
    return null;
  }

}

Input Validation By Domain Objects

dddEven though the approach outlined above will work quite well for many applications, it is fundamentally flawed.

At first sight, proponents of Domain-Driven Design (DDD) might like the idea of creating the Quantity class.

But the Order and Quantity classes do not model domain concepts; they model REST representations. This distinction may be subtle, but it is important.

DDD deals with domain concepts, while REST deals with representations of those concepts. Domain concepts are discovered, but representations are designed and are subject to all kinds of trade-offs.

For instance, a collection REST resource may use paging to prevent sending too much data over the wire. Another REST resource may combine several domain concepts to make the client-server protocol less chatty.

A REST resource may even have no corresponding domain concept at all. For example, a POST may return 202 Accepted and point to a REST resource that represents the progress of an asynchronous transaction.

ubiquitous-languageDomain objects need to capture the ubiquitous language as closely as possible, and must be free from trade-offs to make the functionality work.

When designing REST resources, on the other hand, one needs to make trade-offs to meet non-functional requirements like performance, scalability, and evolvability.

That’s why I don’t think an approach like RESTful Objects will work. (For similar reasons, I don’t believe in Naked Objects for the UI.)

Adding validation to the JavaBeans that are our resource representations means that those beans now have two reasons to change, which is a clear violation of the Single Responsibility Principle.

We get a much cleaner architecture when we use JAX-B JavaBeans only for our REST representations and create separate domain objects that handle validation.

Putting validation in domain objects is what Dan Bergh Johnsson refers to as Domain-Driven Security.

cave-artIn this approach, primitive types are replaced with value objects. (Some people even argue against using any Strings at all.)

At first it may seem overkill to create a whole new class to hold a single integer, but I urge you to give it a try. You may find that getting rid of primitive obsession provides value even beyond validation.

What do you think?

How do you handle input validation in your RESTful services? What do you think of Domain-Driven Security? Please leave a comment.

Securing HTTP-based APIs With Signatures

CloudSecurityI work at EMC on a platform on top of which SaaS solutions can be built.

This platform has a RESTful HTTP-based API, just like a growing number of other applications.

With development frameworks like JAX-RS, it’s relatively easy to build such APIs.

It is not, however, easy to build them right.

Issues With Building HTTP-based APIs

The problem isn’t so much in getting the functionality out there. We know how to develop software and the available REST/HTTP frameworks and libraries make it easy to expose the functionality.

That’s only half the story, however. There are many more -ilities to consider.

rest-easyThe REST architectural style addresses some of those, like scalability and evolvability.

Many HTTP-based APIs today claim to be RESTful, but in fact are not. This means that they are not reaping all of the benefits that REST can bring.

I’ll be talking more about how to help developers meet all the constraints of the REST architectural style in future posts.

Today I want to focus on another non-functional aspect of APIs: security.

Security of HTTP-based APIs

In security, we care about the CIA-triad: Confidentiality, Integrity, and availability.

Availability of web services is not dramatically different from that of web applications, which is relatively well understood. We have our clusters, load balancers, and what not, and usually we are in good shape.

Confidentiality and integrity, on the other hand, both require proper authentication, and here matters get more interesting.

Authentication of HTTP-based APIs

authenticationFor authentication in an HTTP world, it makes sense to look at HTTP Authentication.

This RFC describes Basic and Digest authentication. Both have their weaknesses, which is why you see many APIs use alternatives.

Luckily, these alternatives can use the same basic machinery defined in the RFC. This machinery includes status code 401 Unauthorized, and the WWW-Authenticate, Authentication-Info, and Authorization headers. Note that the Authorization header is unfortunately misnamed, since it’s used for authentication, not authorization.

The final piece of the puzzle is the custom authentication scheme. For example, Amazon S3 authentication uses the AWS custom scheme.

Authentication of HTTP-based APIs Using Signatures

The AWS scheme relies on signatures. Other services, like EMC Atmos, use the same approach.

It is therefore good to see that a new IETF draft has been proposed to standardize the use of signatures in HTTP-based APIs.

Standardization enables the construction of frameworks and libraries, which will drive down the cost of implementing authentication and will make it easier to build more secure APIs.

What do you think?

what-do-you-thinkIf you’re in the HTTP API building and/or consuming business –and who isn’t these days– then please go ahead and read the draft and provide feedback.

I’m also interested in your experiences with building or consuming secure HTTP APIs. Please leave a comment on this post.

Data Classification In the Cloud

Whenever a bug report comes in, I subconsciously classify it according to how it impacts the customer’s ability to derive value from the product.

Many software development companies have policies that formalize such classifications, e.g. into critical, high, medium, and low priority.

One can take that very far, like the Common Weakness Scoring System (CWSS) for classifying security vulnerabilities.

Data classification

Classifications are useful, because they compress a vast set of possibilities into a small set of categories. This makes it easier to decide what to do.

Classification applied to data stored in computer systems is called data classification. There are different reasons for classifying data.

One is to determine appropriate access control policies. It is wasteful to protect all your information at the highest level, so you want to divide up your data into a small number of buckets and take measures that are appropriate for each bucket.

Another important use case of data classification is to drive compliance efforts. If you process health care data, for instance, you may have to comply with the Health Insurance Portability and Accountability Act (HIPAA). This data requires different controls to be put in place than credit card data that is covered by PCI DSS.

Data in the Cloud

Things get more interesting in the cloud.

As a cloud user, you are still subject to the same laws and regulations as before, but now you’ve given away part of the control to your cloud provider. This means you have to make sure that they implement the required controls.

If the regulations you must comply with come with assessments, then those must extend to the cloud provider. Many cloud providers will not allow you to come in and do such assessments yourself, but they may allow assessments from third parties, like TRUSTe for a Safe Harbor assessment.

As a cloud provider, you will want to implement as many controls as possible, to support the maximum number of laws and regulations that your customers must comply with.

Both parties benefit from clear contracts. Part of such a contract may be a Data Protection Agreement that lists the duties of both parties in classifying and properly protecting data to meet security requirements and regulations.

If you’re unsure how to do all of this right, then you may want to look for guidance from the Cloud Security Alliance (CSA).

XACML In The Cloud

The eXtensible Access Control Markup Language (XACML) is the de facto standard for authorization.

The specification defines an architecture (see image on the right) that relates the different components that make up an XACML-based system.

This post explores a variation on the standard architecture that is better suitable for use in the cloud.

Authorization in the Cloud

In cloud computing, multiple tenants share the same resources that they reach over a network. The entry point into the cloud must, of course, be protected using a Policy Enforcement Point (PEP).

Since XACML implements Attribute-Based Access Control (ABAC), we can use an attribute to indicate the tenant, and use that attribute in our policies.

We could, for instance, use the following standard attribute, which is defined in the core XACML specification: urn:oasis:names:tc:xacml:1.0:subject:subject-id-qualifier.

This identifier indicates the security domain of the subject. It identifies the administrator and policy that manages the name-space in which the subject id is administered.

Using this attribute, we can target policies to the right tenant.

Keeping Policies For Different Tenants Separate

We don’t want to mix policies for different tenants.

First of all, we don’t want a change in policy for one tenant to ever be able to affect a different tenant. Keeping those policies separate is one way to ensure that can never happen.

We can achieve the same goal by keeping all policies together and carefully writing top-level policy sets. But we are better off employing the security best practice of segmentation and keeping policies for different tenants separate in case there was a problem with those top-level policies or with the Policy Decision Point (PDP) evaluating them (defense in depth).

Multi-tenant XACML Architecture

We can use the composite pattern to implement a PDP that our cloud PEP can call.

This composite PDP will extract the tenant attribute from the request, and forward the request to a tenant-specific Context Handler/PDP/PIP/PAP system based on the value of the tenant attribute.

In the figure on the right, the composite PDP is called Multi-tenant PDP. It uses a component called Tenant-PDP Provider that is responsible for looking up the correct PDP based on the tenant attribute.

LinkedIn Incident Shows Need for SecaaS

Security is a negative feature.

What I mean by that is that you will never get kudos for implementing a secure system, but you certainly will get a lot of flak for an insecure system, as the recent LinkedIn incident shows. Therefore, security is a distraction for most developers; they’d rather focus on their core business of implementing features.

Where have we heard that before?

Cloud Computing to the Rescue: SecaaS

Cloud computing promises to free businesses from having to buy, install, and maintain their own software and hardware, so they can focus on their core business.

We can apply this idea to security as well. The Cloud Security Alliance (CSA) calls this Security as a Service, or SecaaS.

For instance, instead of learning how to store passwords securely, we could just use an authentication service and not store passwords ourselves at all.

The alternative of using libraries, while infinitely better than rolling our own, is less attractive than the utility model. We’d still have to update the libraries ourselves. Also, with libraries, we depend on a language level API, which segments the market, making it less efficient.

There are some hurdles to tackle before SecaaS will go mainstream. Let’s take a look at one issue close to my heart.

Security as a Service Requires Standards

The utility model of cloud computing requires standards. This is also true for SecaaS.

Each type of security service should have one or at most a few standards, to level the playing field for vendors and to allow for easy switching between offerings of different vendors.

For authorization, I think we already have such a standard: XACML.

We also need a more general standard that applies to all security services, so that we have a simple programming model for integrating different security services into our applications. I believe that standard should be REST.
Update: The market seems to agree with this. Just see the figures in this presentation.

This is one of the reasons why we’re working on a REST profile for XACML. Stay tuned for more information on that.

So what do you think about SecaaS? Let me know in the comments.