Service Runtime in Windows Azure

Roles and Instances

Windows Azure implements a Platform as a Service model through the concept of roles. There are two types of role: a web role deployed with IIS; and a worker role which is similar to a windows service. Azure implements horizontal scaling of a service through the deployment of multiple instances of roles. Each instance of a role is allocated exclusive use of a VM selected from one of several sizes from a small instance with 1 core to an extra-large instance with 8 cores.  Memory and local disk space also increase in going from a small instance to an extra-large instance.

All inbound network traffic to a role passes through a stateless load balancer which uses an unspecified algorithm to distribute inbound calls to the role among instances of the role. Individual instances do not have public IP addresses and are not directly addressable from the Internet. Instances are able to connect to other instances in the service using TCP and HTTP.

Azure provides two deployment slots: staging for testing in a live environment; and production for the production service. There is no real difference between the two slots.

It is important to remember that Azure charges for every deployed instance of every role in both production and staging slots regardless of the status of the instance. This means that it is necessary to delete an instance to avoid being charged for it.

Fault Domains and Upgrade Domains

There are two ways to upgrade an Azure service: in-place upgrade and Virtual IP (VIP) swap. An in-place upgrade replaces the contents of either the production or staging slot with a new Azure application package and configuration file. A VIP swap literally swaps the virtual IP addresses associated with roles in the production and staging slots. Note that it is not possible to do an in-place upgrade where the new application package has a modified Service Definition file. Instead, any existing service in one of the slots must be deleted before the new version is uploaded. A VIP swap does support modifications to the Service Definition file.

The Windows Azure SLA comes into force only when a service uses at least two instances per role.  Azure uses fault domains and upgrade domains to facilitate adherence to the SLA.

When Azure instances are deployed, the Azure fabric spreads them among different fault domains which means they are deployed so that a single hardware failure does not bring down all the instances. For example, multiple instances from one role are not deployed to the same physical server. The Azure fabric completely controls the allocation of instances to fault domains but an Azure service can view the fault domain for each of its instances through the RoleInstance.FaultDomain property.

Similarly, the Azure fabric spreads deployed instances among several upgrade domains. The Azure fabric implements an in-place upgrade by bringing down all the services in a single upgrade domain, upgrading them, and then restarting them before moving on to the next upgrade domain. The number of upgrade domains is configurable through the upgradeDomainCount attribute to the ServiceDefinition root element in the Service Definition file. The default number of upgrade domains is 5 but this number should be scaled with the number of instances. The Azure fabric completely controls the allocation of instances to upgrade domains, modulo the number of upgrade domains, but an Azure service can view the upgrade domain for each of its instances through the RoleInstance.UpdateDomain property. (Shame about the use of upgrade in one place and update in another.)

Service Definition and Service Configuration

Azure services are defined and configured through their Service Definition and Service Configuration files. Since an in-place upgrade cannot be used to implement a change to the Service Definition file any such change must be applied through a fresh deployment to an empty deployment slot. The Service Configuration file comprises one of the two distinct parts of the service application package and consequently can be modified through an in-place upgrade. Indeed, it is possible to modify the Service Configuration file directly on the Azure portal.

The Service Definition file specifies the roles contained in the service along with the instance size into which each role is to be deployed.

  • upgradeDomainCount – number of upgrade domains for the service
  • vmsize – the instance size from Small through ExtraLarge
  • ConfigurationSettings – defines the settings used to configure the service
  • LocalStorage – specifies the amount and name of disk space on the local VM
  • InputEndpoints- defines the external endpoints for a role
  • InternalEndpoint – defines the internal endpoints for a role
  • Certificates– specifies the name and location of the X.509 certificate store

The Service Configuration file provides the configured values for:

  • osVersion – specifies the Azure guest OS version for the deployed service
  • Instances – specifies the number of instances of a role
  • ConfigurationSettings – specifies the role-specific configuration parameters
  • Certificates – specifies X.509 certificates for the role

Note that the names of the service configuration settings are specified in the Service Definition file so it is not possible to do an in-place upgrade when the definition of the service configuration settings is changed. Similarly, an in-place upgrade cannot be used to modify the instance size.


RoleEntryPoint is the base class providing the Azure fabric an entry point to a role. All worker roles must contain a class derived from RoleEntryPoint but web roles can use ASP.Net lifecycle management instead. The standard Visual Studio worker role template provides a starter implementation of the necessary derived class. RoleEntryPoint is declared:

public abstract class RoleEntryPoint {
    protected RoleEntryPoint();

    public virtual Boolean OnStart();
    public virtual void OnStop();
    public virtual void Run();

The Azure fabric initializes the role by invoking the overridden OnStart() method. Prior to this call the status of the role is Busy. Note that a web role can put initialization code in Application_Start instead of OnStart(). The overridden Run() is invoked following successful completion of OnStart() and provides the primary working thread for the role. The role recycles automatically when Run() exits so care should be taken, through use of Thread.Sleep() for example, that the Run() method does not terminate. Azure invokes the overridden OnStop() during a normal suspension of the service. The Azure fabric stops the service automatically if OnStop() does not return within 30 seconds. Note that a web role can put shutdown code in Application_End instead of OnStop().

Jim Nakashima, of the Azure Team, has an interesting post describing the precise sequence of calls made by the Azure fabric throughout the service lifecycle.


The Role class represents a role in an Azure service. It is declared:

public abstract class Role {
    public abstract ReadOnlyCollection<RoleInstance> Instances { get; }
    public abstract String Name { get; }

Name is the name of the Azure service and Instances is a collection of the deployed instances of the role. Instances is used in navigating the role topology when, for example, an instance endpoint is required.


Perhaps confusingly, the RoleEnvironment class provides functionality allowing an Azure instance to interact with the Azure fabric as well as functionality providing access to the Service Configuration file and limited access to the Service Definition file.

RoleEnvironment is declared:

public sealed class RoleEnvironment {
    public static event EventHandler<RoleEnvironmentChangedEventArgs> Changed;
    public static event EventHandler<RoleEnvironmentChangingEventArgs> Changing;
    public static event EventHandler<RoleInstanceStatusCheckEventArgs> StatusCheck;
    public static event EventHandler<RoleEnvironmentStoppingEventArgs> Stopping;

    public static RoleInstance CurrentRoleInstance { get; }
    public static String DeploymentId { get; }
    public static Boolean IsAvailable { get; }
    public static IDictionary<String,Role> Roles { get; }

    public static String GetConfigurationSettingValue(String configurationSettingName);
    public static LocalResource GetLocalResource(String localResourceName);
    public static void RequestRecycle();

The IsAvailable property specifies whether or not the Azure environment is available. It might have been useful had there also been a property specifying the deployment slot – production, staging, or development – in which the service is running. DeploymentId identifies the current deployment, Roles specifies the roles contained in the current service, and CurrentRoleInstance is a RoleInstance object representing the current instance.

GetConfigurationSettingValue() retrieves a configuration setting for the current role from the Service Configuration file. GetLocalResource() returns a LocalResource object providing access to the root path for any local storage configured for the current role in the Service Definition file. Note that local storage is provided on a per-role basis and each instance of a role has its own local storage which is not accessible from other instances. This post contains more detailed information about local storage including an example of using it. RequestRecycle() initiates a recycle, i.e., stop and start, of the current role.

The RoleEnvironment class also provides four events to which a role can register a callback method to be notified about various changes to the Azure environment. A role typically registers callback methods with these events in its OnStart() method.

The StatusCheck event is raised every 15 seconds and an instance uses the StatusCheck event notification to indicate it is busy and should be taken out of the load-balancer rotation. The Stopping event is raised when an instance is undergoing a controlled shutdown although there is no guarantee it will be called when an instance is shutting down due to an unhandled error. Note that the Stopping event is raised before the OnStop() overridden method is over.

The Changing event is raised before and the Changed event after a configuration change is applied to the role. The callback method for the Changing event has access to the old value of the configuration setting and can be used to control whether or not the instance is restarted in response to the configuration change. The callback method for the Changed event has access to the new value of the configuration setting and can be used to reconfigure the instance in response to the change. The Changing and Changed callback methods are also used to handle topology changes to the service whereby the number of instances of a role is changed.

Steven Nagy has a good post on Azure configuration demonstrating the use of the Changing and Changed events. In the comments to that post, Mike Kelly has the interesting reminder that the csrun command can be used to to manually update the Service Configuration file in the development fabric, as follows:

csrun /update:1729;ServiceConfiguration.cscfg

csrun is located in the bin directory of the Windows Azure SDK. ServiceConfiguration.cscfg is the updated configuration file. 1729 is the deployment Id, in the development fabric, of this example and should be replaced by the appropriate deployment Id. This update technique works well when the service is not being debugged, successfully handling configuration changes and increases to the number of instances – although it explicitly does not handle a reduction in the number of instances. When the update is applied to a service being debugged, the change initially appears to work but then the service undergoes a tear down and stops.

An example, taken from the Visual Studio worker role template, of a callback method invoked by the Changing event is:

private void RoleEnvironmentChanging(object sender, RoleEnvironmentChangingEventArgs e)
    // If a configuration setting is changing
    if (e.Changes.Any(change => change is RoleEnvironmentConfigurationSettingChange))
        // Set e.Cancel to true to restart this role instance
        e.Cancel = true;

Each of the four events has a different callback method declaration with the following EventArgs-derived parameter type – RoleEnvironmentChangedEventArgs, RoleEnvironmentChangingEventArgsRoleEnvironmentStoppingEventArgs, RoleInstanceStatusCheckEventArgs – for the Changed, Changing, Stopping and StatusCheck respectively for the callback methods.

The RoleEnvironmentChangingEventArgs  and RoleEnvironmentChangingEventArgs classes are declared:

public class RoleEnvironmentChangingEventArgs : CancelEventArgs {
    public ReadOnlyCollection<RoleEnvironmentChange> Changes { get; }

public class RoleEnvironmentChangedEventArgs : EventArgs {
    public ReadOnlyCollection<RoleEnvironmentChange> Changes { get; }

Note that RoleEnvironmentChangingEventArgs is actually derived from CancelEventArgs, rather than directly from EventArgs, and its Cancel property is used to indicate whether or not the role should be recycled in response to a change in the Configuration File. Setting Cancel to true indicates the role should be recycled. This is shown in the example above.

The Changes property of both RoleEnvironmentChangingEventArgs  and RoleEnvironmentChangedEventArgs contains a ReadOnlyCollection of RoleEnvironmentChange objects. However, RoleEnvironmentChange is merely the base class to the RoleEnvironmentConfigurationSettingChange and RoleEnvironmentTopologyChange classes which are the actual types of the objects in the ReadOnlyCollection.

RoleEnvironmentConfigurationSettingChange contains information related to changes to configuration settings other than the number of instances. The only non-trivial property exposed by the RoleEnvironmentConfigurationSettingChange class is:

public String ConfigurationSettingName { get; }

ConfigurationSettingName specifies a configuration setting changed in the Configuration File. The Changes collection passed into the Changing callback method lists the before-the-change values of any changed configuration setting while the Changes collection passed into the Changed callback method lists the after-the-change values of any changed configuration setting.

RoleEnvironmentTopologyChange contains information related to changes in the topology of the service, i.e., the number of instances. A service can use this information to garner information about new instances to connect to. The only non-trivial property exposed by the RoleEnvironmentTopologyChange class is:

public String RoleName { get; }

RoleName specifies the name of a role whose instance count is being changed. Note that other than the current role there is no way to find the number of instances of any role that does not have an internal endpoint defined in the Service Definition file.

The RoleInstanceStatusCheckEventArgs class is declared:

public class RoleInstanceStatusCheckEventArgs : EventArgs {
    public RoleInstanceStatusCheckEventArgs();

    public RoleInstanceStatus Status { get; }

    public void SetBusy();

Status specifies the current status of the instance as being either Busy or Ready from the RoleInstanceStatus enumeration. SetBusy() sets the status of the instance to Busy “for 10 seconds” indicating that the  load balancer should take the instance out of rotation. The documentation specifies that SetBusy() should be called at the end of the busy interval if the instance needs to retain a Busy status for a longer period. This is curious since SetBusy() is only available in a callback method to the RoleEnvironment.StatusCheck event and this event is raised every 15 seconds. I wonder if the intention is not that the Busy status lasts until the next time the StatusCheck event is raised.

RoleEnvironmentStoppingEventArgs is a trivial class providing no functionality beyond that of the base EventArgs class.


The RoleInstance class represents an instance of a role. It is declared:

public abstract class RoleInstance {
    public abstract Int32 FaultDomain { get; }
    public abstract String Id { get; }
    public abstract IDictionary<String,RoleInstanceEndpoint> InstanceEndpoints { get; }
    public abstract Role Role { get; }
    public abstract Int32 UpdateDomain { get; }

FaultDomain and UpdateDomain specify respectively the fault domain and upgrade domain for the instance. Role identifies the role and Id uniquely identifies the instance of the role. InstanceEndpoints is an IDictionary<> linking the name of each instance endpoint specified in the Service Definition file with the actual definition of the RoleInstanceEndpoint. Note that each instance of a role has distinct actual RoleInstanceEndpoint for each specific instance endpoint defined in the Service Definition file.


Two types of endpoint may be associated with an Azure role through specification in the Service Definition file.

  • Input endpoint
  • Internal endpoint

An input endpoint is a public-facing endpoint. A web role may have only one HTTP input endpoint and one HTTPS input endpoint. A worker role may have up to five HTTP, HTTPS and TCP input endpoints as long as each is associated with a different port number [Updated 7/14/2010]. External services make connection requests to the virtual IP address for the service and the input endpoint port specified for the role in the Service Definition file. These connection requests are load balanced and forwarded to an Azure-allocated port on one of the role instances. On this Azure Forum thread, lopeming reports the result of tests suggesting that load balancing is random in the Azure fabric and round robin (1, 2, 3, 4, etc) in the development fabric.

The RoleInstanceEndpoint class represents an input endpoint or internal endpoint associated with an instance. RoleInstanceEndpoint is declared:

public abstract class RoleInstanceEndpoint {
    public abstract IPEndPoint IPEndpoint { get; }
    public abstract RoleInstance RoleInstance { get; }

RoleInstance identifies the instance associated with the endpoint and IPEndpoint contains the  local IP address of the instance and the port number for the endpoint.

Instances must request the input endpoint from theAzure fabric to identify the port to listen on, as in the following example:

RoleInstanceEndpoint chatServiceEndPoint =

where ChatService is the unique name of the input endpoint in the Service Definition file.

An internal endpoint is a private endpoint used for communication among instances of one or more roles. A web role may have only one HTTP internal endpoint. A worker role may have an unlimited number of HTTP and TCP internal endpoints, the only limitation being that each internal endpoint must have a unique name.

Instances must ask the Azure fabric to identify the endpoint to connect to, as in the following example:

RoleInstanceEndpoint actorEndpoint =

which retrieves an endpoint representing the Actor endpoint of instance 17 of the role named BestSupportingRole. The current instance can use the information contained in the RoleInstanceEndpoint to connect to the specified port on the other instance.

Note that RoleEnvironment.Roles reports all Instances as being of zero size except the current instance and any instance with an internal endpoint declared in the Service Definition file.

Steve Marx has a good presentation at PDC 2009 working through an example of using input and external endpoints. There is an in-depth example, Windows Azure: Worker Role Communication, in a Hands-On Lab in the Windows Azure Platform Training Kit.

UPDATE 9/6/2010

In this Azure Forum thread, Ryan Dunn of the Azure Team writes:

You are limited to 5 endpoints per role.  Additionally, you can have up to 5 input endpoints (external facing ones).  So that means you have have 25 total endpoints, but only 5 of which can be input endpoints and the rest internal.

In the same thread, Frank Siegemund of the Azure Team writes:

You can have a maximum of 5 incoming endpoints per role (input or internal, doesn’t matter). You can have a maximum of 5 roles. 5 * 5 = 25.


LocalResource represents the local storage, on the file system of the instance, defined for the role in the Service Definition file. Each instance has its own local storage that is not accessible from other instances. As described in this post, local storage is also used for the local cache for an Azure Drive. LocalResource is declared:

public abstract class LocalResource {
    public abstract Int32 MaximumSizeInMegabytes { get; }
    public abstract String Name { get; }
    public abstract String RootPath { get; }

Name uniquely identifies the local storage with MaximumSizeInMegabytes specifying the maximum amount of space available. RootPath specifies the root path of the local storage in the local file system. An earlier post contains a more complete description of local storage along with examples of its use.


About Neil Mackenzie

Cloud Solutions Architect. Microsoft
This entry was posted in Windows Azure. Bookmark the permalink.

One Response to Service Runtime in Windows Azure

  1. Pingback: Working with Azure 10 tip for day to day work part II

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s