Elasticity of Cloud Services
The great promise of cloud computing is the closer matching of compute resources to compute needs. This leads to significant cost savings and allows the creation of novel services that previously would have been cost prohibitive.
The driver of all this is the elasticity of cloud services – the ability to scale up and scale down services as needed. Much of the emphasis is on the ability to scale up – but the ability to scale down is a no less significant factor in reducing costs. Elasticity is a better descriptor of this than scalability since the latter traditionally refers to scaling up not scaling down. However, the term auto-elasticity has never taken off so autoscaling it is.
Workload Patterns for the Cloud
In a PDC09 presentation, Dianne O’Brien described four workload patterns that were optimal for the cloud:
- On and off
- Growing fast
- Unpredictable bursting
- Predictable bursting
An on and off workload is one used periodically or occasionally. An example is a batch process performed once a month.
A growing fast workload is one trending up with time. A more pessimistic name for this is grow fast or fail fast. An example is a rapidly growing website.
An unpredictable bursting workload is one where a steady load is occasionally and unpredictably subject to sudden spikes. An example is a magazine website where interest can spike if an article suddenly becomes popular.
A predictable bursting workload is one that varies periodically but predictably. An examples is a business-focused service where the usage drops off outside work hours.
Other than an on and off workload these workloads can all be managed automatically by the service detecting how busy it is and adjusting the compute resources dedicated to the service. Although an on and off workload can be managed automatically it would be difficult for the service to autoscale were it completely off.
The basic idea of autoscaling is to use performance parameters of the service to indicate the amount of compute resources to devote to the service. If the instances of a web role are constantly CPU bound serving up pages then additional instances could be added. Another example is when the Azure Queue used to drive a worker role is growing faster than the worker role instances are able to handle the messages in the queue. The performance parameters used to drive autoscaling depend on the performance requirements of the Azure service.
There are practical limitations on autoscaling in Azure. It takes about 10 minutes to start an additional instance of a running service and 1-2 minutes to tear an instance down. However, Azure instances are charged by the hour. Consequently, it does not make sense to autoscale an Azure service at timescales much less than an hour.
Another limitation is that it is not possible to scale down to 0 instances of a role. This could be useful in a scenario where a role is needed only occasionally.
Windows Azure Service Management REST API
The Windows Azure Service Management REST API supports the programmatic management of Azure Services. It can be used to develop applications to manage Azure services and Azure Storage accounts. It can also be invoked inside a service to manage the service itself and implement autoscaling.
The number of instances for a role is specified in the Instances element of the Service Configuration file. This number can be changed manually using the Azure Services portal. Alternatively, the Azure Service Management API can be used to modify the Service Configuration file and, specifically, the number of running instances. This is how autoscaling of an Azure role is implemented.
The Azure Service Management API works only in the cloud so it is not possible to test autoscaling in the development fabric. It is possible to test manual scaling in the development fabric using csrun. However, this is limited to increasing the number of instances of a service started without debugging. csrun can not be used to reduce the number of instances.
Examples of Autoscaling
Steven Nagy has an article on autoscaling in Windows Azure Platform: Articles from the Trenches Volume 1
I described the Azure Service Management API in an earlier post in which I walked through the process of using it to modify the number of running instances.