Celebrate Learning in Software Development

Every event is either a cause for celebration or an opportunity to learn.

celebrateI don’t remember where I came across this quote, but it has stuck with me. I like how it turns every experience into something positive.

Sometimes I need to remind myself of it, however, especially when there are a lot of, well, learning opportunities in a row.

One recent case was when I had started a long running performance test overnight. The next morning when I came back to it, there was no useful information at all. None whatsoever.

What had happened?

scdfOur system is a fast data solution built on Spring Cloud Data Flow (SCDF). SCDF allows you to compose stream processing solutions out of data microservices built with Spring Boot.

The performance test spun up a local cluster, ingested a lot of data, and spun the cluster down, all the while capturing performance metrics.

(This is early stages performance testing, so it doesn’t necessarily need to run on a production-like remote cluster.)

Part of the shutdown procedure was to destroy the SCDF stream. The stream destroy command to the SCDF shell is supposed to terminate the apps that make up the stream. It did in our functional tests.

But somehow it hadn’t this time. After the performance test ran, the supporting services were terminated, but the stream apps kept running. And that was the problem. These apps continued to try to connect to the supporting services, failed to do that, and wrote those failures to the log files. The log files had overflown and the old ones had been removed, in an effort to save disk space.

All that was left, were log files filled with nothing but connection failures. All the useful information was gone. While I was grateful that I still had space on my disk left, it was definitely not a cause for celebration.

So then what could we learn from this event?

Obviously we need to fix the stream shutdown procedure.

kubernetesCome to think of it, we had already learned that lesson. The code to shut down our Kubernetes cluster doesn’t use stream destroy, but simply deletes all the replication controllers and pods that SCDF creates.

We did it that way, because the alternative proved unreliable. And yet we had failed to update the equivalent code for a local cluster. In other words, we had previously missed an opportunity to learn!

Determined not that make that mistake again, we tried to look beyond fixing the local cluster shutdown code.

One option is to not delete old logs, so we wouldn’t have lost the useful information. However, that almost certainly would have led to a full disk and a world of hurt. So maybe, just maybe, we shouldn’t go there.

Another idea is to not log the connection failures that filled up the log files. Silently ignoring problems isn’t exactly a brilliant strategy either, however. If we don’t log problems, we have nothing to monitor and alert on.

release-itA better idea is to reduce the number of connection attempts in the face of repeated failures. Actually, resiliency features like circuit breakers were already in the backlog, since the need for it was firmly drilled into us by the likes of Nygard.

We just hadn’t worked on that story yet, because we didn’t have much experience in this area and needed to do some homework.

So why not spend a little bit of time to do that research now? It’s not like we could work on analyzing the performance test results.

It turns out that this kind of stuff is very easy to accomplish with the FailSafe library:

private final CircuitBreaker circuitBreaker = new CircuitBreaker()
    .withFailureThreshold(3, 10)
    .withSuccessThreshold(3)
    .withDelay(1, TimeUnit.SECONDS);
private final SyncFailsafe<Object> safeService = Failsafe
    .with(circuitBreaker)
    .withFallback(() -> DEFAULT_VALUE);

@PostConstruct
public void init() {
  circuitBreaker.onOpen(() -> LOG.warn("Circuit breaker opened"));
  circuitBreaker.onClose(() -> LOG.warn("Circuit breaker closed"));
}

private Object getValue() {
  return safeService.get(() -> remoteService.getValue());
}

learnI always feel better after learning something new. Taking every opportunity to learn keeps my job interesting and makes it easier to deal with the inevitable problems that come my way.

Instead of being overwhelmed with negativity, the positive experience of improving my skills keeps me motivated to keep going.

What else could we have learned from this incident? What have you learned recently? Please leave a comment below.

Advertisements

Removing Deployment Friction With Push-To-Deploy

appengineAt work we use CloudFoundry as our PaaS, but I also like to keep informed about what other platforms do.

Google AppEngine Introduces Push-To-Deploy

Google AppEngine recently added an interesting feature: Push-to-Deploy through Git.

With Push-To-Deploy, you can simply push your code to a Git repository to get your code deployed on AppEngine.

This Git repository is maintained by Google and tied to your cloud account. I guess this is implemented using the post-receive Git server hook.

Push-To-Deploy Removes Friction

What I like about this feature is that it removes some friction from the deployment process: you no longer need to know about how to deploy your application on AppEngine.

Push-To-Deploy inches us closer to a Frictionless Development Environment (FDE). The two most likely candidates to become the FDE of choice both support Git, so it’s easy to use Push-To-Deploy in both Orion and Cloud9.

More Friction Remains

LubricationOf course, this is only a small step and a lot more work needs to be done before we really have an FDE.

In my ideal world, for any change that I make the FDE would automatically run the tests and code checkers in the background and, when successful, push the changes to a development branch to make them available for my co-workers.

To make this efficient, only tests that could potentially have been impacted by the changes would run, and they would run in parallel in the cloud. When specified criteria are matched, changes on the development branch would propagate to master and, using Push-To-Deploy, to production.

Although this is all far far away, every step is to be applauded, and I hope other PaaS providers will follow Google’s example.

What Do You Think?

Do you use Google AppEngine? Git? Would you use Push-To-Deploy? Would you like to see a similar feature in CloudFoundry or another PaaS?

Please leave a comment.

Securing HTTP-based APIs With Signatures

CloudSecurityI work at EMC on a platform on top of which SaaS solutions can be built.

This platform has a RESTful HTTP-based API, just like a growing number of other applications.

With development frameworks like JAX-RS, it’s relatively easy to build such APIs.

It is not, however, easy to build them right.

Issues With Building HTTP-based APIs

The problem isn’t so much in getting the functionality out there. We know how to develop software and the available REST/HTTP frameworks and libraries make it easy to expose the functionality.

That’s only half the story, however. There are many more -ilities to consider.

rest-easyThe REST architectural style addresses some of those, like scalability and evolvability.

Many HTTP-based APIs today claim to be RESTful, but in fact are not. This means that they are not reaping all of the benefits that REST can bring.

I’ll be talking more about how to help developers meet all the constraints of the REST architectural style in future posts.

Today I want to focus on another non-functional aspect of APIs: security.

Security of HTTP-based APIs

In security, we care about the CIA-triad: Confidentiality, Integrity, and availability.

Availability of web services is not dramatically different from that of web applications, which is relatively well understood. We have our clusters, load balancers, and what not, and usually we are in good shape.

Confidentiality and integrity, on the other hand, both require proper authentication, and here matters get more interesting.

Authentication of HTTP-based APIs

authenticationFor authentication in an HTTP world, it makes sense to look at HTTP Authentication.

This RFC describes Basic and Digest authentication. Both have their weaknesses, which is why you see many APIs use alternatives.

Luckily, these alternatives can use the same basic machinery defined in the RFC. This machinery includes status code 401 Unauthorized, and the WWW-Authenticate, Authentication-Info, and Authorization headers. Note that the Authorization header is unfortunately misnamed, since it’s used for authentication, not authorization.

The final piece of the puzzle is the custom authentication scheme. For example, Amazon S3 authentication uses the AWS custom scheme.

Authentication of HTTP-based APIs Using Signatures

The AWS scheme relies on signatures. Other services, like EMC Atmos, use the same approach.

It is therefore good to see that a new IETF draft has been proposed to standardize the use of signatures in HTTP-based APIs.

Standardization enables the construction of frameworks and libraries, which will drive down the cost of implementing authentication and will make it easier to build more secure APIs.

What do you think?

what-do-you-thinkIf you’re in the HTTP API building and/or consuming business –and who isn’t these days– then please go ahead and read the draft and provide feedback.

I’m also interested in your experiences with building or consuming secure HTTP APIs. Please leave a comment on this post.

Bridging the Client-Server Divide

webapp-architectureMost software these days is delivered in the form of web applications, and the move towards cloud computing will only emphasize this trend.

Web apps consist of client and server parts, where the client part has been getting bigger lately to deliver a richer user experience.

This split has implications for developers, because the technologies used on the client and server parts are often different.

The client is ruled by HTML, CSS, and JavaScript, while the server is most often developed using JVM or .NET based languages like Java and C#.

Disadvantages of Different Client and Server Technologies

Developers of web applications risk becoming either specialists confined to a single part of the stack or polyglot programmers.

Polyglot programming is the practice of knowing and using many programming languages. There are both advantages and disadvantages associated with polyglot programming. I believe the overriding disadvantage is the context switching involved, which degrades productivity and opens the doors to extra bugs.

Being a specialist has advantages and disadvantages as well. A big disadvantage I see is the “us versus them”, or “not my problem” culture that can arise. In general, Agile teams prefer generalists.

Bringing Server Technologies to the Client

Many attempts have been made at bridging the gap between client and server. Most of these attempts were about bringing server-side technologies to the client.

GWTJava on the client has failed to reached widespread adoption, and now that many people advice to disable Java applets altogether because of security reasons it seems increasingly unlikely that it ever will.

Bringing .NET to the client has likewise failed as Silverlight adoption continues to drop.

Another idea is to translate from server to client technologies. Many languages can now be compiled to JavaScript. The most mature effort is Google Web Toolkit (GWT), which translates from Java. The main problem with GWT is that it supports only a small subset of Java.

All in all I don’t feel there currently is a satisfactory way of using server technologies on the client.

Bringing Client Technologies to the Server

So what about the reverse? There is really only one client-side technology worth looking at today: JavaScript. The only other rival, Flash, is losing out quickly due to lack of support from Apple and the rise of HTML5.

Node.jsJavaScript on the server is starting to make inroads, thanks to the Node.js platform.

It is used by the Cloud9 IDE, for example, and supported by Platform-as-a-Service providers like CloudFoundry and Heroku.

What do you think?

If I had to put my money on any unification approach, it would be Node.js.

Do you agree? What needs to happen to make this a common way of developing web apps? Please let me know your thoughts in the comments.

Data Classification In the Cloud

Whenever a bug report comes in, I subconsciously classify it according to how it impacts the customer’s ability to derive value from the product.

Many software development companies have policies that formalize such classifications, e.g. into critical, high, medium, and low priority.

One can take that very far, like the Common Weakness Scoring System (CWSS) for classifying security vulnerabilities.

Data classification

Classifications are useful, because they compress a vast set of possibilities into a small set of categories. This makes it easier to decide what to do.

Classification applied to data stored in computer systems is called data classification. There are different reasons for classifying data.

One is to determine appropriate access control policies. It is wasteful to protect all your information at the highest level, so you want to divide up your data into a small number of buckets and take measures that are appropriate for each bucket.

Another important use case of data classification is to drive compliance efforts. If you process health care data, for instance, you may have to comply with the Health Insurance Portability and Accountability Act (HIPAA). This data requires different controls to be put in place than credit card data that is covered by PCI DSS.

Data in the Cloud

Things get more interesting in the cloud.

As a cloud user, you are still subject to the same laws and regulations as before, but now you’ve given away part of the control to your cloud provider. This means you have to make sure that they implement the required controls.

If the regulations you must comply with come with assessments, then those must extend to the cloud provider. Many cloud providers will not allow you to come in and do such assessments yourself, but they may allow assessments from third parties, like TRUSTe for a Safe Harbor assessment.

As a cloud provider, you will want to implement as many controls as possible, to support the maximum number of laws and regulations that your customers must comply with.

Both parties benefit from clear contracts. Part of such a contract may be a Data Protection Agreement that lists the duties of both parties in classifying and properly protecting data to meet security requirements and regulations.

If you’re unsure how to do all of this right, then you may want to look for guidance from the Cloud Security Alliance (CSA).

Likely Candidates for Frictionless Development Environments

Last time I reviewed the book on Consumption Economics, which explains how technology companies and their products will have to change to survive the brave new world that we’re entering.

So what would we find if we take the lessons from the book and apply them to our own software development environment? I think the answer would be surprisingly close to what I’ve called a Frictionless Development Environment (FDE) before.

To be honest, I’ve only started thinking more systematically about FDEs after reading Consumption Economics. In Five Essential Components of a Frictionless Development Environment, I’ve laid out the major building blocks of an FDE: cloud computing, big data analytics, recommendation engines, plug-in architecture, and open source.

It may be to soon to expect existing solutions to have all of those, but let’s see where we stand. There are already some cloud development environments. Most of these are geared towards web developers, and offer limited languages (mostly JavaScript). Some offer a big enough range to be interesting to a wide range of developers.

Big data analytics and recommendation engines are big features that are probably not there yet, but could always be added later. What’s more important is to look for a plug-in architecture and particularly for open source. These are fundamental architectural and business decisions.

Using open source as a criterion reduces our list to Cloud9 and Orion. Both have a plug-in architecture. The latter is an Eclipse project, but the former seems more mature. Be sure to follow both Cloud9 and Orion.

So what do you think? Would any of these cloud IDEs work for you? What other open source cloud IDEs are out there?