Too Many Open Files, Or Too Few Bounded Contexts?

A Business Men Climbing a Pile of PapersServer software needs to run unsupervised for long periods of time to be practical.

Release It! is full of horror stories about programming mistakes that get in the way of that lofty goal.

One example is opening files and forgetting to close them.

On some operating systems this will eventually lead to a Too many open files error when the number of open files passes a certain limit.

Of course we want to make sure that doesn’t happen to us. IDEs like Eclipse can give you warnings in certain cases to help with that, but if you’re test infected like me, you will want to write a test to make sure.

This is one of those situations where people less committed to (unit) testing tend to give up. There is nothing in the Java file system API that tells you how many files are open, so it simply cannot be done, right?

Not so fast, my friend!

First, with some proprietary code you can, in fact, count the number of open files in Java.

Second, we can do even better if we step back for a moment and take a look at the bigger picture. Chances are that our application isn’t really about files; that using files is simply a convenient implementation choice.

In the lingo of Eric Evans’ classic Domain-Driven Design, we have (at least) two bounded contexts: your core domain (what people buy/use your application for) and the file system.

A bounded context delimits the applicability of a particular model so that team members have a clear and shared understanding of what has to be consistent and how it relates to other contexts.

Evans describes a number of strategies for dealing with different bounded contexts. In our example, the file system API is not under our control, which rules out the majority of them. The remaining strategies are:

  • Separate Ways, i.e. use something other than the file system. That probably doesn’t make sense in this example
  • Conformist, i.e. follow the Java model slavishly. This is what most developers do without giving it much thought
  • Anti-Corruption Layer, i.e. create an isolating layer to provide clients with functionality in terms of their own domain model. This layer translates in both directions as necessary between the two models

This last strategy gives us more than “just” the opportunity to keep our models clear and to the point. By introducing interfaces that we control in our anti-corruption layer, we also gain the opportunity to mock those interfaces in our tests, making it very easy to verify that we indeed close all the files we open.

This is yet another example where difficulty in unit testing a piece of code points to an opportunity to improve the design. If we consistently act on such opportunities, we will end up with a clean architecture that is a joy to work with.

Communicate Through Stories Rather Than Tasks

cooperationLast time I talked about interfaces between pieces of code.

Today I want to discuss the interface between groups of people involved in developing software.

There are two basic groups: those who develop the software, and those who coordinate that development.

In Agile terms, those groups are the Development Team on the one hand, and the Product Owner and other Stakeholders on the other.

Speaking the Same Language

The two groups need to communicate, so they do best when everybody speaks the same language.

This begins with speaking the same “natural” language, e.g. English. For most teams that will be a given, but teams that are distributed over multiple locations in different countries need to be a bit careful.

tower-of-babelOnce the language is determined, the team should look at the jargon they will be using.

Since the Development Team needs to understand what they must build, they need to know the business terms.

The Product Owner and Stakeholders don’t necessarily need to understand the technical terms, however.

Therefore, it makes sense that the Ubiquitous Language is the language of the business.

Speaking About Work: Stories and Tasks

But the two groups need to talk about more than the business problem to be solved. For any non-trivial amount of work, they also need to talk about how to organize that work.

In most Agile methods, work is organized into Sprints or Iterations. These time-boxed periods of development are an explicit interface between Product Owner and Development Team.

user-storyThe Product Owner is the one steering the Development Team: she decides which User Stories will be built in a given Iteration.

The Development Team implements the requested Stories during the Iteration. They do this by breaking the Stories down into Tasks, having people sign up for the Tasks, and implementing them.

Tasks describe how development time is organized, whereas Stories describe functionality. So Tasks may refer to technical terms like relational databases, while Stories should only talk about functionality, like data persistence.

Stories Are the Interface

Since we value working software, we talk about Stories most of the time. Tasks only exist to make implementing Stories easier. They are internal to the Development Team, not part of the interface the Development Team shares with the Product Owner.

task-boardMany Development Teams do, in fact, expose Tasks to their Product Owners and other Stakeholders.

Sometimes they do this to explain why an Estimate for a Story is higher than the Product Owner expected.

Or they let the Product Owner attend Standup Meetings where Tasks are discussed.

This is fine, as long as both sides understand that Tasks are owned by the Development Team, just as Stories are owned by the Product Owner.

The Development Team may propose Stories, but the Product Owner decides what gets added to the Backlog and what gets scheduled in the Iteration.

Similarly, the Product Owner may propose, question, or inquire about Tasks, but the Development Team decides which Tasks make up a Story and in which order and by who they are implemented.

Always Honor the Interface

This well-defined interface between Product Owner and Development Team allows both sides to do their job well.

burn-up-chartIt’s important to understand that this has implications for how the software development process is organized.

For instance, the metrics we report up should be defined in terms of Stories, not Tasks.

Outside the Development Team, people shouldn’t care about how development time was divided, only about what the result was.

If we stick to the interface, both sides become decoupled and therefore free to innovate and optimize their own processes without jeopardizing the whole.

This is the primary benefit of any well-defined interface and the basis for a successful divide-and-conquer strategy.

What Do You Think?

feedbackWhat problems have you seen in the communication between the two groups?

Are you consciously restricting the communication to stories, or are you letting tasks slip in?

Please leave a comment.

REST 101 For Developers

rest-easy

Local Code Execution

Functions in high-level languages like C are compiled into procedures in assembly. They add a level of indirection that frees us from having to think about memory addresses.

Methods and polymorphism in object-oriented languages like Java add another level of indirection that frees us from having to think about the specific variant of a set of similar functions.

Despite these indirections, methods are basically still procedure calls, telling the computer to switch execution flow from one memory location to another. All of this happens in the same process running on the same computer.

Remote Code Execution

This is fundamentally different from switching execution to another process or another computer. Especially the latter is very different, as the other computer may not even have the same operating system through which programs access memory.

It is therefore no surprise that mechanisms of remote code execution that try to hide this difference as much as possible, like RMI or SOAP, have largely failed. Such technologies employ what is known as Remote Procedure Calls (RPCs).

rpcOne reason we must distinguish between local and remote procedure calls is that RPCs are a lot slower.

For most practical applications, this changes the nature of the calls you make: you’ll want to make less remote calls that are more coarsely grained.

Another reason is more organizational than technical in nature.

When the code you’re calling lives in another process on another computer, chances are that the other process is written and deployed by someone else. For the two pieces of code to cooperate well, some form of coordination is required. That’s the price we pay for coupling.

Coordinating Change With Interfaces

We can also see this problem in a single process, for instance when code is deployed in different jar files. If you upgrade a third party jar file that your code depends on, you may need to change your code to keep everything working.

Such coordination is annoying. It would be much nicer if we could simply deploy the latest security patch of that jar without having to worry about breaking our code. Fortunately, we can if we’re careful.

interfaceInterfaces in languages like Java separate the public and private parts of code.

The public part is what clients depend on, so you must evolve interfaces in careful ways to avoid breaking clients.

The private part, in contrast, can be changed at will.

From Interfaces to Services

In OSGi, interfaces are the basis for what are called micro-services. By publishing services in a registry, we can remove the need for clients to know what object implements a given interface. In other words, clients can discover the identity of the object that provides the service. The service registry becomes our entry point for accessing functionality.

There is a reason these interfaces are referred to as micro-services: they are miniature versions of the services that make up a Service Oriented Architecture (SOA).

A straightforward extrapolation of micro-services to “SOA services” leads to RPC-style implementations, for instance with SOAP. However, we’ve established earlier that RPCs are not the best way to invoke remote code.

Enter REST.

RESTful Services

rest-easyRepresentational State Transfer (REST) is an architectural style that brings the advantages of the Web to the world of programs.

There is no denying the scalability of the Web, so this is an interesting angle.

Instead of explaining REST as it’s usually done by exploring its architectural constraints, let’s compare it to micro-services.

A well-designed RESTful service has a single entry point, like the micro-services registry. This entry point may take the form of a home resource.

We access the home resource like any other resource: through a representation. A representation is a series of bytes that we need to interpret. The rules for this interpretation are given by the media type.

Most RESTful services these days serve representations based on JSON or XML. The media type of a resource compares to the interface of an object.

Some interfaces contain methods that give us access to other interfaces. Similarly, a representation of a resource may contain hyperlinks to other resources.

Code-Based vs Data-Based Services

soapThe difference between REST and SOAP is now becoming apparent.

In SOAP, like in micro-services, the interface is made up of methods. In other words, it’s code based.

In REST, on the other hand, the interface is made up of code and data. We’ve already seen the data: the representation described by the media type. The code is the uniform interface, which means that it’s the same (uniform) for all resources.

In practice, the uniform interface consists of the HTTP methods GET, POST, PUT, and DELETE.

Since the uniform interface is fixed for all resources, the real juice in any RESTful service is not in the code, but in the data: the media type.

Just as there are rules for evolving a Java interface, there are rules for evolving a media type, for example for XML-based media types. (From this it follows that you can’t use XML Schema validation for XML-based media types.)

Uniform Resource Identifiers

So far I haven’t mentioned Uniform Resource Identifiers (URIs). The documentation of many so-called RESTful services may give you the impression that they are important.

identityHowever, since URIs identify resources, their equivalent in micro-services are the identities of the objects implementing the interfaces.

Hopefully this shows that clients shouldn’t care about URIs. Only the URI of the home resource is important.

The representation of the home resource contains links to other resources. The meaning of those links is indicated by link relations.

Through its understanding of link relations, a client can decide which links it wants to follow and discover their URIs from the representation.

Versions of Services

evolutionAs much as possible, we should follow the rules for evolving media types and not introduce any breaking changes.

However, sometimes that might be unavoidable. We should then create a new version of the service.

Since URIs are not part of the public interface of a RESTful API, they are not the right vehicle for relaying version information. The correct way to indicate major (i.e. non-compatible) versions of an API can be derived by comparison with micro-services.

Whenever a service introduces a breaking change, it should change its interface. In a RESTful API, this means changing the media type. The client can then use content negotiation to request a media type it understands.

What Do You Think?

what-do-you-thinkLiterature explaining how to design and document code-based interfaces is readily available.

This is not the case for data-based interfaces like media types.

With RESTful services becoming ever more popular, that is a gap that needs filling. I’ll get back to this topic in the future.

How do you design your services? How do you document them? Please share your ideas in the comments.