How To Secure an Organization That Is Under Constant Attack

Battle of GeonosisThere have been many recent security incidents at well-respected organizations like the Federal Reserve, the US Energy Department, the New York Times, and the Wall Street Journal.


If these large organizations are incapable of keeping unwanted people off their systems, then who is?

The answer unfortunately is: not many. So we must assume our systems are compromised. Compromised is the new normal.

This has implications for our security efforts:

  1. We need to increase our detection capabilities
  2. We need to be able to respond quickly, preferably in an automated fashion, when we detect an intrusion

Increasing Intrusion Detection Capabilities with Security Analytics

There are usually many small signs that something fishy is going on when an intruder has compromised your network.

For instance, our log files might show that someone is logging in from an IP address in China instead of San Francisco. While that may be normal for our CEO, it’s very unlikely for her secretary.

Another example is when someone tries to access a system it normally doesn’t. This may be an indication of an intruder trying to escalate his privileges.

Security AnalyticsMost of us are currently unable to collect such small indicators into firm suspicions, but that is about to change with the introduction of Big Data Analytics technology.

RSA recently released a report that predicts that big data will play a big role in Security Incident Event Monitoring (SIEM), network monitoring, Identity and Access Management (IAM), fraud detection, and Governance, Risk, and Compliance (GRC) systems.

RSA is investing heavily in Security Analytics to prevent and predict attacks, and so is IBM.

Quick, Automated, Responses to Intrusion Detection with Risk-Adaptive Access Control

The information we extract from our big security data can be used to drive decisions. The next step is to automate those decisions and actions based on them.

Large organizations, with hundreds or even thousands of applications, have a large attack surface. They are also interesting targets and therefore must assume they are under attack multiple times a day.

Anything that is not automated is not going to scale.

Risk-Adaptive Access Control (RAdAC)One decision than can be automated is whether we grant someone access to a particular system or piece of data.

This dynamic access control based on risk information is what NIST calls Risk-Adaptive Access Control (RAdAC).

As I’ve shown before, RAdAC can be implemented using eXtensible Access Control Markup Language (XACML).

What do you think?

Is your organization ready to look at security analytics? What do you see as the major road blocks for implementing RAdAC?


Automated distribution creation (4)

In this series of posts, I talked about my continuing quest for the fully automated creation of a distribution for our product. I talked about downloading release notes from our issue tracker and how to add those to our NEWS file. Last time, I had the build automatically update some text files. Now, my journey comes to an end.

The final manual step in creating a distribution for our product, is about incorporating the latest version of the manual. Since our product is an extension of the Component Content Management System Docato, we naturally use Docato itfself for writing our manual. This means the source code to the manual is not in Subversion with the rest of the code, and is not easily available to the build system. So what to do?

Well, Docato is pretty versatile. One can have a publication’s output sent to a well-known location on a server, for example. From there, we can download it using Ant’ scp task:

<scp file="${remote.manual.html.files}"

where ${remote.manual.html.files} can use wildcards, like ${download.dir}/html/AMDS-CMS/*. But the download directory is on a different machine, so you need to provide the username and password for logging into that machine:

<property name="download.dir"

Of course, having a username and password in an Ant build file is a security risk. But in this case, the build file is not available outside our firewall, so we’re good. Otherwise, you’d have to provide the user credentials on the command line when running the Ant target:

ant create-dist -Dusername=foo -Dpassword=bar

scp also allows you to rename a file when downloading:

<scp file="${remote.manual.pdf.file}"

So now we can download the manual and incorporate it in our distribution. But how do we know we have the latest version of the manual?

Again, Docato comes to the rescue. It has the concept of scheduled tasks, actions that automatically run in the background from time to time. These are ideal for making backups, for instance.

So I created a scheduled task that builds a publication, and installed the task code on our manual server. Now every time a tech writer edits something, the edit will be automatically published at most a day later.

And so my journey ends.

But every end is a new beginning. Now that our distribution can be built by running a single Ant target, a whole new world opens up to us. My plan is to create a distribution automatically as part of our CruiseControl build. And then install it automatically, and run some tests against the installed version. Also, the distribution could be made available on some well-known server, so that interested people could always use the latest version for giving demos, for instance. But only when all the tests pass, of course 😉

Automated distribution creation (3)

In previous posts, I talked about my continuing quest for the fully automated creation of a distribution for our product. First, I talked about downloading release notes from our issue tracker. Then I showed how to add those to our NEWS file. This time, it’s time for updating some text files.

Our product is parametrized by some properties in property files. For instance, the number of cache pages to use for our embedded database is stored in the xhive.server.cache property in the file In a typical installation on a production server, you’d want to set this to some high number, since you would have lots of RAM. However, on our development machines, we are not so fortunate. So the file contains a value that works for us developers. But when we build a distribution, we need to increase the value.

Enter the replaceregexp Ant task. This task lets you replace the occurrence of a given regular expression with a substitution pattern in a file. So the following piece of Ant script sets the property to 150000:

<replaceregexp file="${property.file}"
    replace="\1 150000" byline="true"/>

I use a backreference in the regular expression to prevent duplication of xhive.server.cache. Also note the use of \s* to match any amount of whitespace.

But once the value is set this high, I can no longer start up our web application on my local machine, since I don’t have the required amount of RAM for that many cache pages (they’re 4K a piece). So I use a similar piece of code to set it back to the development value once the distribution is built.

I also need to update some shell scripts that start Java programs. These shell scripts specify the maximum amount of RAM available to the JVM using the -Xmx and -Xms command line options to java:

<replaceregexp file="${shell.script.file}"
    match="(-Xmx)[0-9]+(m -Xms)[0-9]+(m)"
    replace="\1512\2128\3" byline="true"/>

The replace pattern isn’t as readable as I would like with the backreferences \1, \2, and \3, but I still prefer that to the duplication of the command line options.

Note however, that this messes up the file permissions on *nix systems. So I use the chmod Ant task to fix that:

<chmod file="${shell.script.file}" perm="755"/>

The above code snippets show how to change property values. But I also need to add a value:

<replaceregexp file="${property.file}"
    replace="" byline="true"/>
<echo file="${property.file}"
    message="docato-server = ${}/docato-composer${line.separator}"

I first use a replaceregexp to remove any occurrence of the property, and then add it using the echo task. The replaceregexp is necessary to be able to run the Ant script multiple times without the property being added multiple times. Note the use of ${line.separator} to add a newline in a cross-platform manner.

Automated distribution creation (2)

In my previous post I talked about how I managed to automatically download the release notes from our issue tracker web site. These notes still needed adding to our NEWs file, which describes the changes between releases.

There are really two scenarios to deal with here: the release notes for the current release either are already in the NEWS file, or they are not. They are already there when you rebuild the distribution for a release, for example when you’ve found something wrong with it and fixed that. For a human, this is pretty simple to detect, but how does an Ant script know?

Enter the Ant filter chain. This construct resembles a Unix pipe in that you can use it to feed output of one as input to the other. Here’s how I retrieve the version that is currently in the NEWS file:

<loadfile property="current.version"
    <headfilter lines="1"/>
      <replaceregex pattern="[a-zA-Z\s]*([1-9]+\.[0-9]+).*"
      <replacestring from="." to="\."/>

The loadfile task loads the srcFile into the current.version property. But not just as is, no there is a filterchain applied first. The first item in the chain is headfilter, which works just like the Unix head command: in this case it gives the first line of the NEWS file. I don’t want a line, but a string, so next I remove the line ending with the striplinebreaks filter.

Then it’s time for some good old regular expression to extract the version number from the string. The first line of the NEWS file looks like this: Changes in 1.4.0. So I match the text with [a-zA-Z\s]* and then the actual version number with ([1-9]+\.[0-9]+).*.

Note that I use a group to capture only the major and minor version (1.4 in the previous example). The reason for that is that whenever we deliver patch releases, we don’t add a whole new section to the NEWS file, but just expand the current section with the few cases that were fixed by the patch. Since we sort the cases in descending order of reporting, the patch cases will always be at the top.

Following the regular expression there is a replacestring filter that inserts backslashes before points. The reason for that becomes clear when we look at how the Ant script actually uses the current.version property:

<condition property="same.release">
  <matches string="${full.version}"
<antcall target="--remove-current-release-from-news"/>
<antcall target="--add-current-release-to-news"/>

The --remove-current-release-from-news target is only executed when the same.release property is true:

<target name="--remove-current-release-from-news"
  <property name="previous.version.file"
  <echo message="${previous.version}"
  <loadfile srcFile="${previous.version.file}"
        <replacestring from="." to="\."/>          
  <delete file="${previous.version.file}"/>
  <replaceregexp file="${news.file}"
      match=".*(Changes in ${escaped.previous.version}.*)"
      replace="\1" flags="s"/>          

The bulk of the work is done in the final replaceregexp task, where everything before the text Changes in <x>.<y>.<z> is deleted. The code before that is just a convoluted way to escape points in the previous version number. Unfortunately, I’m not aware of any Ant task that can execute a regular expression against a property, so I first put the property into a temporary file and then operate on that file.

Finally, all that is left, is to add the release notes for the current version to the NEWS file:

<target name="--add-current-release-to-news">
  <property name=""
  <concat destfile="${}">
      <pathelement location="${}"/>
      <pathelement location="${news.file}"/>
  <move file="${}"
  <delete file="${}"/>

The only tricky part here is that the concat task doesn’t allow one of its input files to also be the output file. So I have to introduce a temporary file. Then when all is done, the file containing the NEWS section for this release,, is no longer needed.

Automated distribution creation

So we have this automated build with CruiseControl. It generates code, compiles, deploys, and tests. It’s saved my skin a gazillion times. It’s really great.

But it could be even better. It could also build a complete distribution, making the whole software release process a non-event. That’s one of my goals for the coming weeks. So stay tuned. 😉

Currently, the process to build a distribution of our product requires a couple of manual steps. One of these steps is to update the NEWS file, which describes the changes between releases. Of course, everything that changes between releases, is documented in the issue tracking system, in our case FogBugz. (FogBugz is OK to work with most of the time, although I think there are better alternatives, like Jira.)

FogBugz lets you add release notes to each issue (which it calls case), and it provides a standard report to show the release notes for all cases scheduled for a specific release. You can even download this report in XML.

The only problem is that this functionality doesn’t work most of the time. The only time when it is guaranteed to work, is when you try it on the server that hosts FogBugz. Since this machine is in the server room, this is inconvenient to say the least. But even if this functionality worked flawlessly every time, everywhere, it would still be a manual step to collect the XML file.

So I turned to HtmlUnit, a “browser for Java programs. It models HTML documents and provides an API that allows you to invoke pages, fill out forms, click links, etc… just like you do in your normal browser.” We use this great tool a lot to write our acceptance tests.

This time, I used HtmlUnit’s WebClient from within an Ant task to log in into FogBugz, generate the release notes report, extract cases with release notes (some cases have none, since they are too trivial to bother the end user with), and write them to an XML file. This allows me to transform the XML file to plain text using XSLT, giving a NEWS file section for the current release. The next step is to automagically add this to the existing NEWS file. This should be easy enough using Ant’s concat task. I will let you know how this works out.

Importing large data sets

For performance testing, it is often necessary to import a large data set to test against. However, importing large data sets presents its own challenges. Below I want to give some tips on how to deal with those.

  1. Begin with making backups. Not just of your current data, but also of the large data set you want to import. You might just want to transform the data to import, and then it is useful to be able to go back to the original.
  2. Start with a representative subset of the large data set. This will allow you to test the import process without having to wait hours for feedback. Only when you’re convinced that everything works as expected, do you import the whole large data set.
  3. Test the limited data set end-to-end. For instance, the product I’m currently working on consists of a Content Management System (CMS, where people author content) and a Delivery System (DS, where people use the content). Data is imported into the CMS, edited, and finally published to the DS. In this situation, it is not enough to have a successful import into CMS. The publication to DS must also succeed.
  4. Automate the import. When things go wrong, you need to perform the import multiple times. It saves time to be able to run the import with a single command. Even if the import succeeds on the first try (one can dream), you might want to redo the import later, e.g. for performance testing against a new release, or when a new, even larger, data set becomes available.
  5. If you need to transform the data to make the import work, make sure to put the transformation scripts under version control, like your regular code (you do use a version control system, do you?). The build scripts that automate the import should also be put under version control.
  6. If you cannot get your hands on real-world data, you may still be able to do performance testing using generated data. The downside of this approach is that the generated data will probably not contain the exotic border cases that are usually present in real-life data.