The desire for cost-efficient hosting of services provides the impetus for moving services to the cloud. An important driver of that cost-efficiency is the ease with which hosted services can be scaled elastically – up and down – so that service capacity more closely matches service demand.
Windows Azure hosted services are scaled by modifying the instance count of a role (or roles) in the service configuration file. This can be done manually on the Windows Azure Portal which supports either the in-place editing or the uploading of a new service configuration file. The Windows Azure Service Management REST API also exposes operations allowing the service configuration to be replaced with a new version containing different instance counts for the roles.
The Service Management REST API uses X.509 certificates for authentication. This requires that a (self-signed) X.509 certificate be created and uploaded as a management certificate to the Windows Azure Portal. Unlike the service certificates used to provide SSL capability for a hosted service, management certificates are associated with the subscription not an individual hosted service. They are not usually deployed to the hosted service. The reason for this is that the Service Management REST API has visibility across all the hosted services and storage accounts associated with the subscription.
The Windows Azure team created a set of Windows Azure Platform PowerShell cmdlets which allow various Service Management REST API operations to be invoked directly from PowerShell. Cerebreta has released its Azure Management Cmdlets which implement a more extensive set of Service Management REST API operations. Since both of these cmdlets use the Service Management REST API they also authenticate using a management certificate which must be provided with every PowerShell cmdlet invoked.
The ability to modify the instance count of a role is a necessary but not sufficient requirement for achieving cost-efficiency for a hosted service. It is also important to ensure that the number of deployed instances matches the number of instances required to satisfy demand. Consequently, it is important to monitor that demand so that the appropriate number of instances can be deployed. Furthermore, since it takes about 10 minutes to add instances it is important that likely demand is taken into account when choosing the appropriate number of instances.
The reality is that it is much easier to modify an instance count than it is to know what that instance count should actually be. There are many variables affecting demand, including time of the day or the day of the week. The hosted service may be undergoing rapid growth or even, alas, slow decay.
The Windows Azure Diagnostics API supports the capture of performance counter data from each instance and its persistence to Windows Azure Storage. This data can be analyzed to provide a retrospective quantification of service demand. The Windows Azure Diagnostics API supports the remote management of Windows Azure Diagnostics, so that the performance counters being captured can be modified to provide additional visibility into a hosted service.
It is not particularly difficult to capture performance counter data, and it is not particularly difficult to modify the service configuration. However, it requires some analysis to use historic performance counter data along with predictions of future service demand to choose the appropriate number of role instances to run at any particular time. Since this is not likely to be a core feature of any hosted service it makes sense to outsource this to a third party service focused specifically on autoscaling a hosted service.
Paraleap Technologies has released AzureWatch which it promotes as elasticity-as-a-service for Windows Azure. AzureWatch supports the elastic scaling of a Windows Azure hosted service entirely through configuration with no code change required – as long as Windows Azure Diagnostics has been configured for the hosted service. Brian Prince (@brianhprince) has a nice demonstration of AzureWatch in his Tech Ed 11 presentation on Ten Must-Have Tools for Windows Azure.
AzureWatch is billed at 1.275 cents per instance hour which comes to $0.33 per instance per day. It also has a free introductory offer of 14 days or 500 hours, whichever lasts longer.
AzureWatch uses the Windows Azure Diagnostics API to manage the performance counters captured and persisted to Windows Azure Storage. It monitors that data periodically and uses it to select an appropriate instance count for each role. AzureWatch then uses the Service Management REST API to set the instance counts to the appropriate values. Consequently, the AzureWatch monitoring service must be configured with the subscription ID containing the hosted service, a management certificate for that subscription, and the storage account to which the performance counter data is persisted. AzureWatch can create a management certificate as part of its initial configuration, but this certificate must be uploaded manually through the Windows Azure Portal. Note that AzureWatch monitors and scales only those roles which have been configured to use Windows Azure Diagnostics.
AzureWatch comprises the AzureWatch Monitoring Service and the AzureWatch Control Panel. The Monitoring Service can be run locally, though Paraleap recommends that it hosts the service remotely. The Control Panel is used to control the Monitoring Service and to configure the rules used to scale the hosted service.
The Control Panel is used to configure a set of Raw Metrics for each hosted service role to be monitored. These raw metrics can include any performance counter – including custom counters – as well as the message count for a queue. AzureWatch also creates raw metrics out of various system metrics such as the instance count in various states (e.g. ready, stopped and busy).
The raw metrics are used to configure a set of Aggregated Metrics, each of which represents some computed value for a raw metric over a period of time. These computations are:
The time period is expressed in minutes up to 30 days. For example, an aggregated value can be calculated for the average CPU usage over 30 minutes. Each aggregated value has a unique name. The same raw metric can be used multiple times with different computations so that, for example, there can be aggregated values for both the minimum and maximum CPU use over the same or even different periods of time.
The aggregated values are then used to define a set of Rules used to configure the autoscaling. A rule is comprised of a Boolean formula using the aggregated values. For example:
CPUTime > 70
where CPUTime might be an aggregated value for the average CPU time over the last 30 minutes.
The rules are evaluated sequentially each minute and if a rule evaluates to true the configured action is invoked. These actions are:
- scale the instance count up or down by a specified number of instances
- set the instance count to a specified number of instances
- do nothing
Furthermore, the Monitoring Service can be configured to send an email notification when a rule evaluates to true. This is useful when trying out AzureWatch since an email notification can be sent instead of actually increasing the instance count.
A rule can be configured to be invoked only during part of the day, and/or part of an hour. Additionally, to prevent a rule from being satisfied too often it can be disabled for some period of time after it has evaluated to true. AzureWatch also supports hard upper and lower instance counts for each role. This is useful for avoiding surprises if a rule is misconfigured. In fact, Windows Azure also implements a soft quota of 20 instances per subscription.
AzureWatch contains some beta functionality in which it accesses a specified web page in a hosted service and retrieves a payload in a simple XML format. This can be used to specify metrics other than those in performance counters. Once an External Feed has been configured, the metrics specified in it are added to the list of raw metrics where they can be used similarly to the other raw metrics.
Once the the configuration of raw metrics, aggregated metrics and rules has been completed the configuration can be published to the Monitoring Service so that monitoring can commence. The Monitoring Service ensures that any required changes are made to the Windows Azure Diagnostics configuration. It then initiates the periodic evaluation of the rules – and autoscaling is ready to go.
Having done all that, I thought it was pretty cool to receive an email with the following:
Scaling action was successful
Rule CPUTime Rule triggered myservice\Staging\WorkerRole1 to perform scale action: ‘Scale up by’
From instance count of 1 to 2
Rule formula: CPUTime > 70
All known parameter values:
And this having done nothing more than configure AzureWatch. No working out how to download the performance counters from Windows Azure Storage. No working out averages over time of the CPU performance counter. No working out how to download the service configuration. No working out how to modify the instance count for a role. No working out how to upload the service configuration to Windows Azure. Autoscaling is not something you need to implement yourself.
If you are interested in autoscaling hosted services in Windows Azure you should look at AzureWatch.