Celebrate Learning in Software Development

Every event is either a cause for celebration or an opportunity to learn.

celebrateI don’t remember where I came across this quote, but it has stuck with me. I like how it turns every experience into something positive.

Sometimes I need to remind myself of it, however, especially when there are a lot of, well, learning opportunities in a row.

One recent case was when I had started a long running performance test overnight. The next morning when I came back to it, there was no useful information at all. None whatsoever.

What had happened?

scdfOur system is a fast data solution built on Spring Cloud Data Flow (SCDF). SCDF allows you to compose stream processing solutions out of data microservices built with Spring Boot.

The performance test spun up a local cluster, ingested a lot of data, and spun the cluster down, all the while capturing performance metrics.

(This is early stages performance testing, so it doesn’t necessarily need to run on a production-like remote cluster.)

Part of the shutdown procedure was to destroy the SCDF stream. The stream destroy command to the SCDF shell is supposed to terminate the apps that make up the stream. It did in our functional tests.

But somehow it hadn’t this time. After the performance test ran, the supporting services were terminated, but the stream apps kept running. And that was the problem. These apps continued to try to connect to the supporting services, failed to do that, and wrote those failures to the log files. The log files had overflown and the old ones had been removed, in an effort to save disk space.

All that was left, were log files filled with nothing but connection failures. All the useful information was gone. While I was grateful that I still had space on my disk left, it was definitely not a cause for celebration.

So then what could we learn from this event?

Obviously we need to fix the stream shutdown procedure.

kubernetesCome to think of it, we had already learned that lesson. The code to shut down our Kubernetes cluster doesn’t use stream destroy, but simply deletes all the replication controllers and pods that SCDF creates.

We did it that way, because the alternative proved unreliable. And yet we had failed to update the equivalent code for a local cluster. In other words, we had previously missed an opportunity to learn!

Determined not that make that mistake again, we tried to look beyond fixing the local cluster shutdown code.

One option is to not delete old logs, so we wouldn’t have lost the useful information. However, that almost certainly would have led to a full disk and a world of hurt. So maybe, just maybe, we shouldn’t go there.

Another idea is to not log the connection failures that filled up the log files. Silently ignoring problems isn’t exactly a brilliant strategy either, however. If we don’t log problems, we have nothing to monitor and alert on.

release-itA better idea is to reduce the number of connection attempts in the face of repeated failures. Actually, resiliency features like circuit breakers were already in the backlog, since the need for it was firmly drilled into us by the likes of Nygard.

We just hadn’t worked on that story yet, because we didn’t have much experience in this area and needed to do some homework.

So why not spend a little bit of time to do that research now? It’s not like we could work on analyzing the performance test results.

It turns out that this kind of stuff is very easy to accomplish with the FailSafe library:

private final CircuitBreaker circuitBreaker = new CircuitBreaker()
    .withFailureThreshold(3, 10)
    .withSuccessThreshold(3)
    .withDelay(1, TimeUnit.SECONDS);
private final SyncFailsafe<Object> safeService = Failsafe
    .with(circuitBreaker)
    .withFallback(() -> DEFAULT_VALUE);

@PostConstruct
public void init() {
  circuitBreaker.onOpen(() -> LOG.warn("Circuit breaker opened"));
  circuitBreaker.onClose(() -> LOG.warn("Circuit breaker closed"));
}

private Object getValue() {
  return safeService.get(() -> remoteService.getValue());
}

learnI always feel better after learning something new. Taking every opportunity to learn keeps my job interesting and makes it easier to deal with the inevitable problems that come my way.

Instead of being overwhelmed with negativity, the positive experience of improving my skills keeps me motivated to keep going.

What else could we have learned from this incident? What have you learned recently? Please leave a comment below.

Removing Deployment Friction With Push-To-Deploy

appengineAt work we use CloudFoundry as our PaaS, but I also like to keep informed about what other platforms do.

Google AppEngine Introduces Push-To-Deploy

Google AppEngine recently added an interesting feature: Push-to-Deploy through Git.

With Push-To-Deploy, you can simply push your code to a Git repository to get your code deployed on AppEngine.

This Git repository is maintained by Google and tied to your cloud account. I guess this is implemented using the post-receive Git server hook.

Push-To-Deploy Removes Friction

What I like about this feature is that it removes some friction from the deployment process: you no longer need to know about how to deploy your application on AppEngine.

Push-To-Deploy inches us closer to a Frictionless Development Environment (FDE). The two most likely candidates to become the FDE of choice both support Git, so it’s easy to use Push-To-Deploy in both Orion and Cloud9.

More Friction Remains

LubricationOf course, this is only a small step and a lot more work needs to be done before we really have an FDE.

In my ideal world, for any change that I make the FDE would automatically run the tests and code checkers in the background and, when successful, push the changes to a development branch to make them available for my co-workers.

To make this efficient, only tests that could potentially have been impacted by the changes would run, and they would run in parallel in the cloud. When specified criteria are matched, changes on the development branch would propagate to master and, using Push-To-Deploy, to production.

Although this is all far far away, every step is to be applauded, and I hope other PaaS providers will follow Google’s example.

What Do You Think?

Do you use Google AppEngine? Git? Would you use Push-To-Deploy? Would you like to see a similar feature in CloudFoundry or another PaaS?

Please leave a comment.

Securing HTTP-based APIs With Signatures

CloudSecurityI work at EMC on a platform on top of which SaaS solutions can be built.

This platform has a RESTful HTTP-based API, just like a growing number of other applications.

With development frameworks like JAX-RS, it’s relatively easy to build such APIs.

It is not, however, easy to build them right.

Issues With Building HTTP-based APIs

The problem isn’t so much in getting the functionality out there. We know how to develop software and the available REST/HTTP frameworks and libraries make it easy to expose the functionality.

That’s only half the story, however. There are many more -ilities to consider.

rest-easyThe REST architectural style addresses some of those, like scalability and evolvability.

Many HTTP-based APIs today claim to be RESTful, but in fact are not. This means that they are not reaping all of the benefits that REST can bring.

I’ll be talking more about how to help developers meet all the constraints of the REST architectural style in future posts.

Today I want to focus on another non-functional aspect of APIs: security.

Security of HTTP-based APIs

In security, we care about the CIA-triad: Confidentiality, Integrity, and availability.

Availability of web services is not dramatically different from that of web applications, which is relatively well understood. We have our clusters, load balancers, and what not, and usually we are in good shape.

Confidentiality and integrity, on the other hand, both require proper authentication, and here matters get more interesting.

Authentication of HTTP-based APIs

authenticationFor authentication in an HTTP world, it makes sense to look at HTTP Authentication.

This RFC describes Basic and Digest authentication. Both have their weaknesses, which is why you see many APIs use alternatives.

Luckily, these alternatives can use the same basic machinery defined in the RFC. This machinery includes status code 401 Unauthorized, and the WWW-Authenticate, Authentication-Info, and Authorization headers. Note that the Authorization header is unfortunately misnamed, since it’s used for authentication, not authorization.

The final piece of the puzzle is the custom authentication scheme. For example, Amazon S3 authentication uses the AWS custom scheme.

Authentication of HTTP-based APIs Using Signatures

The AWS scheme relies on signatures. Other services, like EMC Atmos, use the same approach.

It is therefore good to see that a new IETF draft has been proposed to standardize the use of signatures in HTTP-based APIs.

Standardization enables the construction of frameworks and libraries, which will drive down the cost of implementing authentication and will make it easier to build more secure APIs.

What do you think?

what-do-you-thinkIf you’re in the HTTP API building and/or consuming business –and who isn’t these days– then please go ahead and read the draft and provide feedback.

I’m also interested in your experiences with building or consuming secure HTTP APIs. Please leave a comment on this post.

Bridging the Client-Server Divide

webapp-architectureMost software these days is delivered in the form of web applications, and the move towards cloud computing will only emphasize this trend.

Web apps consist of client and server parts, where the client part has been getting bigger lately to deliver a richer user experience.

This split has implications for developers, because the technologies used on the client and server parts are often different.

The client is ruled by HTML, CSS, and JavaScript, while the server is most often developed using JVM or .NET based languages like Java and C#.

Disadvantages of Different Client and Server Technologies

Developers of web applications risk becoming either specialists confined to a single part of the stack or polyglot programmers.

Polyglot programming is the practice of knowing and using many programming languages. There are both advantages and disadvantages associated with polyglot programming. I believe the overriding disadvantage is the context switching involved, which degrades productivity and opens the doors to extra bugs.

Being a specialist has advantages and disadvantages as well. A big disadvantage I see is the “us versus them”, or “not my problem” culture that can arise. In general, Agile teams prefer generalists.

Bringing Server Technologies to the Client

Many attempts have been made at bridging the gap between client and server. Most of these attempts were about bringing server-side technologies to the client.

GWTJava on the client has failed to reached widespread adoption, and now that many people advice to disable Java applets altogether because of security reasons it seems increasingly unlikely that it ever will.

Bringing .NET to the client has likewise failed as Silverlight adoption continues to drop.

Another idea is to translate from server to client technologies. Many languages can now be compiled to JavaScript. The most mature effort is Google Web Toolkit (GWT), which translates from Java. The main problem with GWT is that it supports only a small subset of Java.

All in all I don’t feel there currently is a satisfactory way of using server technologies on the client.

Bringing Client Technologies to the Server

So what about the reverse? There is really only one client-side technology worth looking at today: JavaScript. The only other rival, Flash, is losing out quickly due to lack of support from Apple and the rise of HTML5.

Node.jsJavaScript on the server is starting to make inroads, thanks to the Node.js platform.

It is used by the Cloud9 IDE, for example, and supported by Platform-as-a-Service providers like CloudFoundry and Heroku.

What do you think?

If I had to put my money on any unification approach, it would be Node.js.

Do you agree? What needs to happen to make this a common way of developing web apps? Please let me know your thoughts in the comments.

Data Classification In the Cloud

Whenever a bug report comes in, I subconsciously classify it according to how it impacts the customer’s ability to derive value from the product.

Many software development companies have policies that formalize such classifications, e.g. into critical, high, medium, and low priority.

One can take that very far, like the Common Weakness Scoring System (CWSS) for classifying security vulnerabilities.

Data classification

Classifications are useful, because they compress a vast set of possibilities into a small set of categories. This makes it easier to decide what to do.

Classification applied to data stored in computer systems is called data classification. There are different reasons for classifying data.

One is to determine appropriate access control policies. It is wasteful to protect all your information at the highest level, so you want to divide up your data into a small number of buckets and take measures that are appropriate for each bucket.

Another important use case of data classification is to drive compliance efforts. If you process health care data, for instance, you may have to comply with the Health Insurance Portability and Accountability Act (HIPAA). This data requires different controls to be put in place than credit card data that is covered by PCI DSS.

Data in the Cloud

Things get more interesting in the cloud.

As a cloud user, you are still subject to the same laws and regulations as before, but now you’ve given away part of the control to your cloud provider. This means you have to make sure that they implement the required controls.

If the regulations you must comply with come with assessments, then those must extend to the cloud provider. Many cloud providers will not allow you to come in and do such assessments yourself, but they may allow assessments from third parties, like TRUSTe for a Safe Harbor assessment.

As a cloud provider, you will want to implement as many controls as possible, to support the maximum number of laws and regulations that your customers must comply with.

Both parties benefit from clear contracts. Part of such a contract may be a Data Protection Agreement that lists the duties of both parties in classifying and properly protecting data to meet security requirements and regulations.

If you’re unsure how to do all of this right, then you may want to look for guidance from the Cloud Security Alliance (CSA).

Likely Candidates for Frictionless Development Environments

Last time I reviewed the book on Consumption Economics, which explains how technology companies and their products will have to change to survive the brave new world that we’re entering.

So what would we find if we take the lessons from the book and apply them to our own software development environment? I think the answer would be surprisingly close to what I’ve called a Frictionless Development Environment (FDE) before.

To be honest, I’ve only started thinking more systematically about FDEs after reading Consumption Economics. In Five Essential Components of a Frictionless Development Environment, I’ve laid out the major building blocks of an FDE: cloud computing, big data analytics, recommendation engines, plug-in architecture, and open source.

It may be to soon to expect existing solutions to have all of those, but let’s see where we stand. There are already some cloud development environments. Most of these are geared towards web developers, and offer limited languages (mostly JavaScript). Some offer a big enough range to be interesting to a wide range of developers.

Big data analytics and recommendation engines are big features that are probably not there yet, but could always be added later. What’s more important is to look for a plug-in architecture and particularly for open source. These are fundamental architectural and business decisions.

Using open source as a criterion reduces our list to Cloud9 and Orion. Both have a plug-in architecture. The latter is an Eclipse project, but the former seems more mature. Be sure to follow both Cloud9 and Orion.

So what do you think? Would any of these cloud IDEs work for you? What other open source cloud IDEs are out there?

Five Essential Components of a Frictionless Development Environment

One of the challenges of maintaining a consistent programming style in a team is for everyone to have the same workspace settings, especially in the area of compiler warnings.

Every time a new member joins the team, an existing member sets up a new environment, or a new version of the compiler comes along, you havebook.e to synchronize settings.

My team recently started using Workspace Mechanic, an Eclipse plug-in that allows you to save those settings in an XML file that you put under source control.

The plug-in periodically compares the workspace settings with the contents of that file. It notifies you in case of differences, and allows you to update your environment with a couple of clicks.

Towards a Frictionless Development Environment

Workspace Mechanic is a good example of a lubricant, a tool that lubricates the development process to reduce friction.

LubricationMy ideal is to take this to the extreme with a Frictionless Development Environment (FDE) in which all software development activities go very smoothly.

Let’s see what we would likely need to make such an FDE a reality.

In this post, I will look at a very small example that uncovers some of the basic components of an FDE.

Example: Creating the Class Under Test

In Test-Driven Development, we start out with a test and there is no class under test yet. Eclipse has a Quick Fix to create the class, but we still have to manually invoke it and select a source folder to store it in (assuming you have different source folders for main and test code).

It would be nicer if the IDE would understand what you’re trying to do and automatically create the skeleton for the class under test for you and save it in the right place.

Big DataThe crux is for the tool to understand what you are doing, or else it could easily draw the wrong conclusion and create all kinds of artifacts that you don’t want.

This kind of knowledge is highly user and potentially even project specific. It is therefore imperative that the tool collects usage data and uses that to optimize its assistance. We’re likely talking about big data here.

Given the fact that it’s expensive in terms of storage and computing power to collect and analyze these statistics, it makes sense to do this in a cloud environment.

That will also allow for quicker learning of usage patterns when working on different machines, like in the office and at home. More importantly, it allows building on usage patterns of other people.

What this example also shows, is that we’ll need many small, very focused lubricants. This makes it unlikely for one organization to provide all lubricants for an FDE that suits everybody, even for a specific language.

Open Source SoftwareThe only practical way of assembling an FDE is through a plug-in architecture for lubricants.

Building an FDE will be a huge effort. To realize it on the short term, we’ll probably need an open source model. No one company could put in the resource required to pull this off in even a couple of years.

The Essential Components of a Frictionless Development Environment

This small example uncovered the following building blocks for a Frictionless Development Environment:

  1. Cloud Computing will provide economies of scale and access from anywhere
  2. Big Data Analytics will discern usage patterns
  3. Recommendation Engines will convert usage patterns into context-aware lubricants
  4. A Plug-in architecture will allow different parties to contribute lubricants and usage analysis tools
  5. An Open Source model will allow many organizations and individuals to collaborate

What do you think?

Do you agree with the proposed components of an FDE? Did I miss something?

Please share your thoughts in the comments.

How Friction Slows Us Down

FrictionI once joined a project where running the “unit” tests took three and a half hours.

As you may have guessed, the developers didn’t run the tests before they checked in code, resulting in a frequently red build.

Running the tests just gave too much friction for the developers.

I define friction as anything that resist the developer while she is producing software.

Since then, I’ve spotted friction in numerous places while developing software.

Friction in Software Development

Since friction impacts productivity negatively, it’s important that we understand it. Here are some of my observations:

  • Friction can come from different sources.
    It can result from your tool set, like when you have to wait for Perforce to check out a file over the network before you can edit it.
    Friction can also result from your development process, for example when you have to wait for the QA department to test your code before it can be released.
  • Friction can operate on different time scales.
    Some friction slows you down a lot, while others are much more benign. For instance, waiting for the next set of requirements might keep you from writing valuable software for weeks.
    On the other hand, waiting for someone to review your code changes may take only a couple of minutes.
  • Friction can be more than simple delays.
    It also rears its ugly head when things are more difficult then they ought to be.
    In the vi editor, for example, you must switch between command and insert modes. Seasoned vi users are just as fast as with editors that don’t have that separation. Yet they do have to keep track of which mode they are in, which gives them a higher cognitive load.

Lubricating Software Development

LubricationThere has been a trend to decrease friction in software development.

Tools like Integrated Development Environments have eliminated many sources of friction.

For instance, Eclipse will automatically compile your code when you save it.

Automated refactorings decrease both the time and the cognitive load required to make certain code changes.

On the process side, things like Agile development methodologies and the DevOps movement have eliminated or reduced friction. For instance, continuous deployment automates the release of software into production.

These lubricants have given us a fighting chance in a world of increasing complexity.

Frictionless Software Development

It’s fun to think about how far we could take these improvements, and what the ultimate Frictionless Development Environment (FDE) might look like.

My guess is that it would call for the combination of some of the same trends we already see in consumer and enterprise software products. Cloud computing will play a big role, as will simplification of the user interaction, and access from anywhere.

What do you think?

What frictions have you encountered? Do you think frictions are the same as waste in Lean?

What have you done to lubricate the frictions away? What would your perfect FDE look like?

Please let me know your thoughts in the comments.

XACML In The Cloud

The eXtensible Access Control Markup Language (XACML) is the de facto standard for authorization.

The specification defines an architecture (see image on the right) that relates the different components that make up an XACML-based system.

This post explores a variation on the standard architecture that is better suitable for use in the cloud.

Authorization in the Cloud

In cloud computing, multiple tenants share the same resources that they reach over a network. The entry point into the cloud must, of course, be protected using a Policy Enforcement Point (PEP).

Since XACML implements Attribute-Based Access Control (ABAC), we can use an attribute to indicate the tenant, and use that attribute in our policies.

We could, for instance, use the following standard attribute, which is defined in the core XACML specification: urn:oasis:names:tc:xacml:1.0:subject:subject-id-qualifier.

This identifier indicates the security domain of the subject. It identifies the administrator and policy that manages the name-space in which the subject id is administered.

Using this attribute, we can target policies to the right tenant.

Keeping Policies For Different Tenants Separate

We don’t want to mix policies for different tenants.

First of all, we don’t want a change in policy for one tenant to ever be able to affect a different tenant. Keeping those policies separate is one way to ensure that can never happen.

We can achieve the same goal by keeping all policies together and carefully writing top-level policy sets. But we are better off employing the security best practice of segmentation and keeping policies for different tenants separate in case there was a problem with those top-level policies or with the Policy Decision Point (PDP) evaluating them (defense in depth).

Multi-tenant XACML Architecture

We can use the composite pattern to implement a PDP that our cloud PEP can call.

This composite PDP will extract the tenant attribute from the request, and forward the request to a tenant-specific Context Handler/PDP/PIP/PAP system based on the value of the tenant attribute.

In the figure on the right, the composite PDP is called Multi-tenant PDP. It uses a component called Tenant-PDP Provider that is responsible for looking up the correct PDP based on the tenant attribute.