At scale, everything breaks.
Urs Hölzle
Since consumers can ask for and get resources at any time and in any quantity, the cloud must be able to scale up and down as load demands. Note that scaling down is just as important as scaling up, to conserve resources and thereby reduce cost.
Different applications running in the cloud will have different workload patterns, be they seasonal, batch, transient, hockey stick, or more complex. Because of these differences, high workloads in some applications will coincide with low workloads in others. This is why resource pooling leads to higher resource utilization rates and economies of scale.
Scalability
To achieve these economies of scale, the cloud infrastructure must be able to scale quickly. Scalability is the ability of a system to improve performance proportionally after adding hardware. In a scalable cloud, one can just add hardware whenever the demand rises, and the applications keep performing at the required level.
Since resources in a system typically have some overhead associated with them, it’s important to understand what percentage of the resource you can actually use. The measurement of the additional output by adding a unit of resource, as compared to the previously added unit of resource is called the scalability factor. Based on this concept we can distinguish the following types of scalability:
- Linear scalability
The scalability factor stays constant when capacity is added. - Sub-linear scalability
The scalability factor decreases when capacity is added. - Supra-linear scalability
The scalability factor increases when capacity is added. This is rare, but can happen. For instance, I/O across multiple disk spindles in a RAID gets better with more spindles. - Negative scalability
The performance of the system gets worse, instead of better, when capacity is added.
There is another way of looking at scalability:
- To scale vertically (or scale up) means to add resources to a single node in the system, for instance adding memory to a single computer. There is a limit to how far one can scale vertically. For instance, 32 bit operating systems can only address 232 bytes, or 4Gb, so adding more memory to those systems is pointless.
- To scale horizontally (or scale out) means to add more nodes to the system. Because of the limitation to scale vertically, it’s very important to be able to scale horizontally. Horizontal scalability also allows the use of commodity hardware in large numbers, which is cheaper than specialized hardware.
Achieving (near) linear horizontal scalability is not easy, but there are some guidelines that help.
Dynamic Provisioning
Cloud systems must not only be able to scale, but scale at will, since cloud consumers should get the resources they want whenever they want it. It is, therefore, important to be able to dynamically provision new computing resources. Dynamic provisioning relies heavily on demand monitoring.
Dynamic provisioning can be manual or automated. In the manual case, the cloud provider’s employees watch the load, and start up virtual machines or provision other resources as needed. This is obviously an expensive solution, that is also error-prone and that doesn’t scale well. For cloud systems, it makes more sense to automate the provisioning process. This means that there is some software agent that continuously watches the measured load and takes action based on policies that describe when to (de)provision resources.
Previous: Resource Pooling | Next: Measured service |