Introduction to Docker and Kubernetes on Azure

There has been much interest recently in the use of microservices to architect large-scale services. Microservices are predicated on the idea of deploying individual applications each of which provides a single service, and linking these microservices together to create a large-scale service. The hope is that the use of microservices simplifies the creation of sophisticated services by making it easier to deploy and upgrade individual applications.

Virtualization has been a core technology that, over the last few years, has enabled both public and private clouds. Virtualization simplifies the task of deploying and configuring new servers. However, since a virtual machine contains an entire OS it is a heavyweight deployment vehicle more suited to the deployment of heavyweight applications than microservices.

Over the years, there have been various attempts to develop lightweight container technology in different UNIX and Linux distributions. The idea with containerization is that many containers can be deployed to a single physical host, with each container hosting an application that is securely isolated from applications hosted in other containers.

Virtualization and containerization can both be used on bare-metal systems, and it would seem that there is no need for virtualization when containerization is used. However, the economics of the public cloud is closely tied to the deployment density and flexibility provided by virtualization. Consequently, in a public cloud containerization is likely to reside on top of virtualization rather than replace it.

Docker

An increasingly fashionable solution is the use of Docker to host microservices. Docker provides an integrated application-hosting environment and a packaging mechanism that simplifies the use of Linux containers for microservice deployment. The Docker application-hosting environment is built on several Linux technologies including Linux Containers and Union File Systems.

A Linux Container is a lightweight application-hosting environment that can be used to host a microservice. A single compute node can host many containers. Even on a single host, distinct containers are isolated from each other. Each container has its own filesystem and TCP/IP ports. There are various patterns for connecting containers such as exposing TCP/IP ports and shared directories on the host system. Container technology fits in well with the idea of creating large-scale services out of simple microservices.

In a traditional filesystem hierarchy, new filesystems are mounted as distinct branches into the existing filesystem tree. In a union filesystem, new filesystems are overlaid on the existing filesystem tree so that files in a single branch may come from different filesystems. Typically, all but one filesystem are mounted read-only. Specific rules also apply when two files with the same name are mounted from different filesystems.

Docker provides a CLI allowing Docker images to be built with these images serving as templates for the creation of containers. Images can be stored locally or uploaded to the Docker Hub where they can be made available either privately or publically. An image is typically built from a configuration file, so that images are created in a declarative rather than procedural manner. This configuration file contains information such as: the base operating system for a container; deployment script for the application; TCP/IP ports to be exposed; files to be copied into the file system of the container; and directories in the host system to be made accessible to the container. Docker caches intermediate steps of the build process, which speeds up the iterative development of images by allowing subsequent builds to start from a known good point in the process. The different steps provide different layers to the union file system used by Docker.

The Docker CLI allows Docker containers to be created from an image stored either locally or in the Docker Hub. Container creation is parameterized allowing different containers built from the same image to be hosted in the same server. A single image can be used to deploy containers to any environment hosting a Docker server, thereby supporting the replicable deployment of an identical application-hosting environment.

The Docker website hosts downloads of the Docker software as well as extensive documentation. James Turnbull (@kartar), of Docker, has written an excellent book, The Docker Book, providing many fully-worked examples of deploying applications in Docker. Adrian Cockroft (@adrianco) recently posted a typically interesting take on the sudden popularity of Docker.

The Docker software comprises a daemon, which manages the Docker containers running on the host server. The Docker client provides the API manage Docker, build images and deploy containers. The Docker client can be used to securely manage Docker running on remote hosts.

Kubernetes

It is easy to deploy one or more Docker containers into a host and configure them to create an integrated service. However, the management of a cluster of compute nodes each hosting many containers is more difficult – and some means to manage the cluster is needed.

Google is developing Kubernetes, an open-source project, written in Go, to manage clusters of Docker hosts. Kubernetes is a pre-production beta in active development, and hosted in a GitHub repo. Kubernetes can be deployed into a number of environments including Google Compute Engine and Microsoft Azure. Mark Lamourine (@markllama) has a nice post showing how to use Kubernetes to deploy services.

The Kubernetes repo contains a design document. The basic idea is that a Kubernetes deployment comprises one or more Docker hosts, referred to as minions, and a master server to control them. The Kubernetes deployment unit is a pod, which is a group of containers on a single host with shared network and storage volumes. A single compute node may host several pods. Kubernetes comes with an API allowing pods to be created and deleted.

Azure

Docker on Azure

Azure Virtual Machines is the IaaS offering in the Microsoft Azure Platform. It supports the deployment of VMs hosting various Linux flavors including Ubuntu, CentOS, OpenSuse, and Oracle Linux. Docker can be deployed easily into an Azure VM in a couple of ways:

  • installation of Docker into an existing Azure VM
  • deployment of an Azure VM with Docker pre-configured

The Azure cross-platform CLI is documented here, along with download and installation instructions. The azure vm create command can be used to deploy a new VM. Once that is running, an SSH connection can be used to access the VM and perform a standard Docker installation.

The Azure cross-platform CLI can also automate the creation and deployment of a VM in which Docker is pre-installed through the azure vm docker create command. The Docker configuration is setup to require authentication, and the credentials are installed in the ~/.docker directory on the machine where the command isexecuted. Furthermore, a Docker endpoint on port 4243 is exposed on the Azure Load Balancer so that a Docker client can authenticate to and access the Docker server over the public internet. Ross Gardler (@rgardler), of Microsoft OpenTech, has a post going into this in detail.

With the Docker client installed locally and appropriate credentials in the ~/.docker directory, the following command can be used to get information about a remote Docker server hosted in an Azure cloud service:

docker –tls -H tcp://CLOUD_SERVICE_NAME.cloudapp.net:4243 info

A curiosity to be aware of is that the Docker daemon requires root access by default so that docker info on a local Docker server fails while the equivalent command succeeds against a remote Docker server.

Kubernetes on Azure

Kubernetes is an open-source project hosted in the GoogleCloudPlatform repos on GitHub. The core Kubernetes GitHub repo contains extensions allowing it to be deployed into various environments including Azure. The instructions for deploying Kubernetes and using it in Azure are here.

Note that the installation scripts currently use the subscription name to generate an Azure storage account name. This may lead to an error if the subscription name has not been changed from the default. Furthermore, although the documentation indicates the use of the West US Azure region there is nothing intrinsic to that region so that with appropriate modification to the configuration scripts Kubernetes can be deployed in any region. The storage account and region can both be configured in kubernetes/release/azure/config.sh.

The installation script:

  • Creates a new storage account named kubeRandomString
  • Creates a new cloud service – named kube-SameRandomString
  • Deploys a master VM
  • Deploys 4 minion VMs and installs Docker on them

The master VM has a public endpoint on 443, and all the VMs have distinct public SSH endpoints exposed. The following example shows how to SSH into the first minion in a cluster:

ssh -i ~/.ssh/azure_rsa CLOUD_SERVICE_NAME.cloudapp.net -p 22001

Once the cluster is up and running, Kubernetes can be used to deploy pods, and the deployment document has examples. Azure endpoints need to be configured manually if the microservices in a pod are to be exposed to the public internet. Kubernetes provides bash scripts to perform other pod-management tasks including the tear down of an entire cluster.

Kubernetes Visualizer

Kubernetes Visualizer is an OSS demonstration from Microsoft OpenTech, that deploys a simple website that provides a basic visual overview of a Kubernetes cluster. It also allows pods to be created. The Kubernetes Visualizer is described here and is documented on and downloadable from the Azure repo on GitHub.

Posted in Azure, Docker, Kubernetes, Linux, Virtual Machines | Tagged , , , | 3 Comments

Affinity Groups in Azure

I see a lot of confusion about the current status of affinity groups in Microsoft Azure and thought it worthwhile to describe that status (or my view of it).

Until 2012, Azure used a north-south network built to handle traditional internet traffic. In this scenario a request comes in from the internet at the top and gets routed down to a server. The response then gets routed back up to the internet. A problem with a datacenter designed in this way is that traffic from one server to another also gets routed up and down – at the cost of increased latency between servers.

Azure applications make heavy use of Azure Storage, so to minimize latency Azure supports the colocation of compute and storage inside a datacenter. This is achieved by the deployment of a cloud service and related storage accounts into an affinity group, which is a logical name for a colocated set of compute and storage nodes. An affinity group is created in a single region and can be used, instead of the region, as the hosting location for a cloud service or a storage account. Until 2012, the use of an affinity group was the recommended way to deploy a cloud service and any related storage accounts.

In 2012, Azure was upgraded to use an east-west network built to handle the compute-to-storage traffic that was dominant in an Azure datacenter. Furthermore, the network was converted to a flat network in which network traffic was routed directly from compute to storage nodes. Additionally network capacity and speed were both significantly upgraded. Brad Calder described this upgrade in a post on the Azure blog.

This network upgrade was essential to the 2012 release of Azure Virtual Machines, the IaaS offering in the Azure Platform. The OS and Data Disks used in the VMs in Azure Virtual Machines are backed by page blobs in Azure Storage accessed across the network. However, since the upgrade there has been less emphasis in documentation on the need to use an affinity group to colocate a cloud service and related storage accounts.

In 2012, Azure Virtual Network was released. Azure Virtual Network supported the creation of a VNET but required that the VNET be created in an affinity group. This meant that any cloud services in the VNET were also in the affinity group.

So what was the problem with affinity groups. The problem was that an affinity group is tied to a particular set of hardware. This was not an issue when Azure provided only one class of hardware – the A0-A4 standard instances. In 2013, however, Azure released high-memory instances (A5-A7) and high CPU instances (A8 and A9). Instances of the new size could not be deployed into a cloud service in an affinity group created before their release. The solution was to: create a new affinity group; recreate the cloud service in the new affinity group; and redeploy the VMs into the new cloud service. This problem was exacerbated by the use of a VNET since, being tied to an affinity group, no cloud service in the VNET could access new instance types. The solution was to: create a new affinity group; recreate the VNET in the new affinity group, recreate all the cloud services; and then redeploy the VMs into the VNET and cloud services.

In 2014, Azure Virtual Network was upgraded to support regional VNETs which span a region not an affinity group. This removes the issue of access to new instance types since a regional VNET has access to all the compute resources in a region. Regional VNETs essentially deprecate affinity group VNETs and at some point every affinity group VNET will be upgraded automatically into a regional VNET.

So where does this leave affinity groups?

An affinity group serves two purposes:

  1. colocation of cloud service and storage account
  2. host for an affinity group VNET

The second of these is now deprecated. The first is still possible but has not been emphasized as the latency optimization it was prior to the upgrade of the Azure network in 2012. In fact the standard way to deploy a VM in the Azure Portal does not support the simultaneous creation of a cloud service in an affinity group and a regional VNET. This confuses people who try to satisfy the modern best practice of using a regional VNET for deployments and the historic best practice of creating a cloud service in an affinity group.

At this point there seems to be little reason to deploy a cloud service into an affinity group. However, as with other deployment choices it might be wise for people to test this out for their specific workloads.

Note that it is actually possible to use the Azure Portal to deploy VMs into an cloud service hosted in an affinity group and resident in a regional VNET. The trick is to do this as two steps:

  • create the cloud service in an affinity group
  • deploy VMs into the cloud service but host them in the VNET

The same trick can be used with Azure cmdlets.

Posted in Cloud Services, Storage Service, Virtual Machines, VNET | Tagged , , , | 8 Comments

Migrating a VM from EC2 to Azure at 300 Mbps

This post contains instructions for migrating a VM from Amazon Web Services EC2 to Microsoft Azure. The instructions assume a basic setup where the AWS EC2 instance is running Windows Server and comprises a single disk containing the OS. A similar technique can be used to migrate additional disks.

The general idea is as follows:

  1. Add a volume to the EC2 instance
  2. Clone the OS disk to a VHD on that volume (using the Disk2VHD utility)
  3. Upload the VHD to Azure Storage (using the Azure PowerShell cmdlets)
  4. Create a logical Azure Disk from that VHD
  5. Create an Azure VM from that Disk
  6. Remove AWS software from the Azure VM
  7. Install the Azure Agent on the Azure VM

This is a pretty smooth process. The only novelty lies in cloning the OS disk and uploading the resulting VHD to Azure Storage.

Add a Volume to the EC2 Instance

A new volume is attached to the EC2 instance to provide a disk sufficiently large to contain the cloned OS disk. The VHD generated in the cloning process is a dynamic VHD that is smaller than the original disk.

Use the AWS portal to create a volume and attach it to the EC2 instance. Once attached, this volume must be initialized and formatted like any other attached volume. It should be provided with some drive letter.

Clone the OS Disk to a VHD

The cloning of the OS disk is performed using the Disk2VHD utility created by Mark Russinovich (@markrussinovich), a technical fellow on the Microsoft Azure team. This utility allows the cloning of a running disk into a VHD.

While logged into an RDP session on the EC2 instance download Disk2VHD. Unzip the Disk2VHD download and copy the directory to a convenient location on the C:\ drive.

Disk2VHD clones a disk into either a dynamic VHD or a dynamic VHDX. Azure supports only fixed format VHDs, but the VHD will be converted to fixed format during the upload process.

In Explorer,

  • Double click on the Disk2VHD.exe to bring up the UI
  • Select the disks to be cloned
  • Specify the output VHD file name
  • Unselect the VHDX option.
  • Click Create to start the cloning process.

The Disk2VHD UI displays a progress bar. On successful completion, the VHD is in the specified location. Note that a dynamic VHD is likely to be smaller than the original disk.

Install the Azure PowerShell cmdlets

The Azure PowerShell cmdlets are used to copy the VHD to Azure Blob Storage in the appropriate region (datacenter). The cmdlets can be downloaded directly from the Azure GitHub repository and installed into the EC2 instance

While logged into an RDP session on the EC2 instance:

  • Browse to the Azure PowerShell release page in GitHub
  • Click on the Windows Standalone link
  • Click Run to install the cmdlets.
  • Click Finish

The Azure PowerShell cmdlets are now installed.

Upload the VHD to Azure Storage

The Azure PowerShell cmdlets must be configured to use the appropriate Azure subscription and storage account, as follows

Once the configuration is completed the Add-AzureVhd cmdlet can be used to automatically convert the VHD to fixed format and upload it to Azure Storage. Add-AzureVhd is supplied with the local path to the VHD and the full URL for the VHD in Azure Storage (in a storage account and container which already exists)

For example:

Add-AzureAccount

Select-AzureSubscription -SubscriptionName YourSubscription

Set-AzureSubscription –SubscriptionName YourSubscription `
–CurrentStorageAccount YourStorageAccount

Add-AzureVhd –Destination `
http://YourStorageAccount.blob.core.windows.net/vhds/EC2InstanceTest.vhd `
-LocalFilePath d:\vhds\EC2Instance.vhd

Note that the Add-AzureVhd command in the example should be a single line.

The upload proceeds in two steps: the creation of an MD5 hash used to verify the success of the upload; and the conversion of the VHD to fixed format and its upload to the specified location in Azure Storage. Note that the uploaded VHD is expanded to the original size of the disk.

Create an Azure Disk from the VHD

Before a VHD in Azure Blob Storage can be used as a disk in an Azure VM it must be configured. This configuration comprises providing a logical name for the disk that can be used in subsequent operations. The Azure Disk is configured as follows:

  • Browse to the Azure Portal
  • Navigate to the Virtual Machines menu (on the LHS)
  • Click on the Disks tab
  • Click on the Create button
  • Provide a logical name for the Disk
  • Locate the VHD URL in Azure Blob Storage
  • Select the VHD contains an Operating System check box
  • Ensure Windows is selected for the Operating Systems Family
  • Click the tick button to create the Azure Disk

On completion, the disk is made visible in the list of Disks on the Azure Portal.

Create an Azure VM from the Azure Disk

This is the standard Azure process of creating a VM from the Gallery, with the only difference being that a custom Disk is used instead of an Image. The general process of using the Gallery to create an Azure VM is documented on this page.

The Azure VM is created as follows:

  • Click on the Instances tab (on the Virtual Machines section of the Azure Portal)
  • Click on the New button
  • Click on From Gallery

On the Choose an Image window

  • Click on My Disks
  • Click on the appropriate Disk
  • Click the Next arrow button

On the Virtual Machine Configuration page

  • Provide a Virtual Machine Name
  • Leave Tier at Standard
  • Select an appropriate Size
  • Click on the Next arrow button

On the next Virtual Machine Configuration page

  • Specify either a new or existing Cloud Service name
  • Select the Azure Subscription
  • Specify the Region that contains the VHD
  • Leave the Availability Set option at None
  • Leave the default Endpoints configuration
  • Click on the Next arrow button

On the third Virtual Machine Configuration page

  • Unselect the VM Agent that supports extensions is already installed checkbox
  • Click the Next arrow button

An Azure VM will now be created with the OS disk being the VHD migrated from AWS. This process takes a few minutes. Note that normally the Virtual Machine name provided would become the hostname, but following this migration the created VM uses the existing EC2 hostname instead.

Connect to the VM

Once the VM reaches the Running state it is possible to RDP into it. This may require a couple of attempts the first time while DNS entries are being updated.

In the Virtual Machines section of the Azure Portal:

  • Click on the Instances tab
  • Click on the VM in the instances list

On the Instance page

  • Click on the Connect button

This brings up a standard RDP Connection dialog

  • Click through the RDP dialogs
  • Sign in to the Azure VM with the original EC2 VM credentials

On the Azure VM Desktop a Shutdown Event Tracker dialog is displayed.

  • Provide a Comment
  • Press OK

You are now successfully signed in to an Azure VM that has been migrated from EC2. Note that the instance information provided on the desktop is that of the original EC2 VM rather than the current Azure VM. This display can be removed by selecting a new desktop background.

Uninstall AWS Software

The AWS software should be removed from the VM. While signed in to an RDP session, use the Control Panel / Uninstall a Program feature to remove the following software:

  • AWS Tools for Windows
  • Aws-cfn-bootstrap
  • Citrix Tools for Virtual Machines (requires reboot)

Install Azure Agent

The Azure Agent can be installed on the Azure VM.

From inside an RDP session to the Azure VM, download and install the Azure Agent.

The Azure Agent is configured using the Azure PowerShell cmdlets, which can be invoked from anywhere the Azure PowerShell cmdlets are installed and configured:

$vm = Get-AzureVM -ServiceName YourService -Name YourVM

$vm.VM.ProvisionGuestAgent = $true

Update-AzureVM -ServiceName YourService -Name YourVM -VM $vm.VM

The installation of the Azure Agent can be tested by installing the BGInfo extension which the agent uses to provide instance information on the desktop background. The BGInfo extension is installed using the following PowerShell script:

$vm = Get-AzureVM -ServiceName YourService -Name YourVM

Set-AzureVMBGInfoExtension -VM $vm

Update-AzureVM -ServiceName YourService -Name YourVM -VM $vm.VM

The next time an RDP session is opened the desktop background will contain information about the current Azure VM.

Summary

This post describes the fairly simple process to migrate a Windows Server VM from Amazon Web Services EC2 to Microsoft Azure.

Posted in Azure, EC2, Virtual Machines | Tagged , | Leave a comment

Using Azure Monitoring Services API with Azure Cloud Services

This post describes how to use the Azure Monitoring Service API to access performance metrics for Azure Cloud Services, the PaaS feature in the Microsoft Azure platform. It follows on from an earlier post describing the use of the Monitoring Services API to access performance metrics for Azure Virtual Machines. That post contains code samples, while this post describes additional configuration for Azure Cloud Services that exposes additional metrics through the Monitoring Services API as well as on the Azure Portal.

The use of the Monitoring Services API is identical for Azure Virtual Machines and Azure Cloud Services. The only difference is that different resource Ids are used for Azure Virtual Machines and Azure Cloud Services. The ResourceIdBuilder class has distinct helper methods to create Virtual Machines and Cloud Service resource Ids:

  • BuildVirtualMachineResourceId()
  • BuildCloudServiceResourceId()

The actual format of the generated resource Ids is as follows in the two cases:

Cloud Services:

/hostedservices/SERVICE_NAME/deployments/DEPLOYMENT_NAME/roles/ROLE_NAME/
roleinstances/ROLE_INSTANCE_ID

Virtual Machines

/hostedservices/SERVICE_NAME/deployments/DEPLOYMENT_NAME/roles/VM_NAME

SERVICE_NAME is the name of the (PaaS or IaaS) cloud service while DEPLOYMENT_NAME identifies the current deployment. For an Azure Cloud Service, the ROLE_NAME specifies the role while the ROLE_INSTANCE_ID specifies an individual role instance. For an Azure Virtual Machine, the VM_NAME specifies the name of the deployed VM.

Minimal Metrics

The Monitoring Service API exposes the following minimal metrics set for both Azure Virtual Machines and Azure Cloud Services:

Name Units Reporting
Disk Read Bytes/sec Bytes / sec Max, Min, Ave
Disk Write Bytes/sec Bytes / sec Max, Min, Ave
Network Out Bytes Total
Network In Bytes Total
Percentage CPU Percentage Max, Min, Ave

Unlike the case of Azure Virtual Machines, the Monitoring Service API provides a means to access additional performance counters by making use of the configuration for Azure Diagnostics supported only in Azure Cloud Services.

Note that these metrics are also displayed on the Azure Portal.

Configuring Azure Diagnostics for Cloud Services

In an Azure Cloud Service, the Azure Diagnostics capability supports the configuration of diagnostics information than can be captured locally on a role instance and then persisted to Azure Storage on some timescale. The diagnostics information that can be captured and persisted includes:

  • Event Logs
  • Performance Counters
  • .NET Trace Logs
  • Azure infrastructure logs
  • IIS Logs

Azure Diagnostics can be configured using the Azure Diagnostics API or through the declarative specification in the diagnostics.wadcfg file that is uploaded in the deployment package. The latter is the recommended technique.

The diagnostics.wadcfg file is an XML file describing the information to be captured, as well as the frequency and conditions under which it is to be persisted. The Visual Studio tooling creates a diagnostics.wadcfg file for each role and puts it under the role in the Azure project. The file is essentially the same for web roles and worker roles. The following is the diagnostics.wadcfg created for a worker role:

<?xml version="1.0" encoding="utf-8"?>
<DiagnosticMonitorConfiguration configurationChangePollInterval="PT1M"
    overallQuotaInMB="4096" xmlns="
http://schemas.microsoft.com/ServiceHosting/2010/10/DiagnosticsConfiguration">
  <DiagnosticInfrastructureLogs />
  <Directories>
    <IISLogs container="wad-iis-logfiles" directoryQuotaInMB="1024" />
    <CrashDumps container="wad-crash-dumps" />
  </Directories>
  <Logs bufferQuotaInMB="1024" scheduledTransferPeriod="PT1M"
    scheduledTransferLogLevelFilter="Error" />
  <WindowsEventLog bufferQuotaInMB="1024" scheduledTransferPeriod="PT1M"
    scheduledTransferLogLevelFilter="Error">
    <DataSource name="Application!*" />
  </WindowsEventLog>
  <PerformanceCounters bufferQuotaInMB="512"
    scheduledTransferPeriod="PT0M">
  <PerformanceCounterConfiguration counterSpecifier=
    "\Memory\Available MBytes" sampleRate="PT3M" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\Web Service(_Total)\ISAPI Extension Requests/sec"
    "sampleRate="PT3M"/>
  <PerformanceCounterConfiguration counterSpecifier=
    "\Web Service(_Total)\Bytes Total/Sec" sampleRate="PT3M"/>
  <PerformanceCounterConfiguration counterSpecifier=
    "\ASP.NET Applications(__Total__)\Requests/Sec" sampleRate="PT3M"/>
  <PerformanceCounterConfiguration counterSpecifier=
    "\ASP.NET Applications(__Total__)\Errors Total/Sec"
    sampleRate="PT3M"/>
  <PerformanceCounterConfiguration counterSpecifier=
    "\ASP.NET\Requests Queued" sampleRate="PT3M"/>
  <PerformanceCounterConfiguration counterSpecifier=
    "\ASP.NET\Requests Rejected" sampleRate="PT3M"/>
  </PerformanceCounters>
</DiagnosticMonitorConfiguration>

The Directories element configures two directories (which, if used, must also be configured as Local Resources in the Service Definition file) used for storing IIS Logs and crash dumps. The Logs element indicates that .NET Trace logs should be persisted every minute (PT1M) if they are of severity Error. The WindowsEventLog element indicates that any event with Error severity in the Application event log should be persisted every minute.

The PerformanceCounters element specifies that the following performance counters should be samples and captured locally every 3 minutes (PT3M), but that they should not be persisted automatically (PT0M):

  • \Memory\Available MBytes
  • \Web Service(_Total)\ISAPI Extension Requests/sec
  • \Web Service(_Total)\Bytes Total/Sec
  • \ASP.NET Applications(__Total__)\Requests/Sec
  • \ASP.NET Applications(__Total__)\Errors Total/Sec
  • \ASP.NET\Requests Queued
  • \ASP.NET\Requests Rejected

These counters are not persisted by default because this could lead to significant amounts of data being persisted to Azure Storage. Persistence is configured by specifying a non-zero time interval for the scheduledTransferPeriod.

Furthermore, any other performance counter configured for the role instances (including custom counters) can also be configured for Azure Diagnostics merely by adding the appropriate entry to the PerformanceCounters element. Ryan Dunn (@dunnry) has a post describing some performance counters that can usefully be added to the list. He also provides some rationale for choosing appropriate sample rates for performance counters so that sufficient, but not too much, information is captured.

Once diagnostics.wadcfg is configured appropriately and deployed to Azure in an application package the configured diagnostic elements, including performance counters, are captured locally and persisted to Azure Storage as scheduled. The data can then be accessed in Azure Storage from where it can be downloaded and analyzed.

Verbose Metrics

The minimal metrics described earlier are also displayed on the Azure Portal for both Azure Virtual Machines and Azure Cloud Services. However, for Azure Cloud Services it is possible to requestrequest that the portal also display Verbose metrics. This is configured by choosing the appropriate setting on the Configure tab for the cloud service. This setting is only active when Azure Diagnostics has been configured for the cloud service.

Once Verbose settings has been specified, the monitoring tab allows any of the performance counters configured in diagnostics.wadcfg to be displayed on the Monitor tab for the cloud service on the Azure Portal.

Furthermore, these performance counters are also exposed to the Monitoring Services API. Their definitions can be accessed and their values downloaded in precisely the same way as for the minimal metrics exposed for Azure Virtual Machines and Azure Cloud Services.

Example

In this example, Azure Diagnostics has been configured with the default performance counters supplemented by a few additional performance counters. These additional performance counters are sampled every 30 seconds while the default counters are sampled every minute. The data is persisted to Azure Storage every two minutes. (The short times are for convenience while running the sample.)


<PerformanceCounters bufferQuotaInMB="512"
    scheduledTransferPeriod="PT2M">
  <PerformanceCounterConfiguration counterSpecifier=
    "\Processor(_Total)\% Processor Time"
    sampleRate="PT30S" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\Memory\Available MBytes" sampleRate="PT30S" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\Memory\Committed MBytes" sampleRate="PT30S" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\Web Service(_Total)\ISAPI Extension Requests/sec"
    sampleRate="PT1M" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\Web Service(_Total)\Bytes Total/Sec"
    sampleRate="PT1M" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\ASP.NET Applications(__Total__)\Requests/Sec"
    sampleRate="PT1M" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\ASP.NET Applications(__Total__)\Errors Total/Sec"
    sampleRate="PT1M" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\ASP.NET\Requests Queued" sampleRate="PT1M" />
  <PerformanceCounterConfiguration counterSpecifier=
    "\ASP.NET\Requests Rejected" sampleRate="PT1M" />
</PerformanceCounters>


The list of metric definitions exposed by the Monitoring Services API for this example is:

Name Units Reporting
\ASP.NET Applications(__Total__)\Errors Total/Sec Errors/sec Average
\ASP.NET Applications(__Total__)\Requests/Sec Requests/sec Average
\Web Service(_Total)\Bytes Total/Sec \Web Service(_Total)\Bytes Total/Sec Average
\ASP.NET\Requests Queued Requests Average
\Memory\Committed MBytes Other Average
\Processor(_Total)\% Processor Time Other Average
\ASP.NET\Requests Rejected Requests Maximum
Network Out Bytes Total
Network In Bytes Total
Percentage CPU Percentage Average
Disk Read Bytes/sec Bytes/sec Average
\Web Service(_Total)\ISAPI Extension Requests/sec Requests/sec Average
\Memory\Available MBytes MBytes Average
Disk Write Bytes/sec Bytes/sec Average

The following shows a \Memory\Available MBytes metric value for a five minute interval:

  • Timestamp: 2014-06-27T06:35:00Z
  • Average: 1119.4
  • Minimum: 1117
  • Maximum: 1121
  • Total: 11194
  • Count: 10

Summary

The Azure Monitoring Service API provides a powerful and easy-to-use way to access performance data for role instances in an Azure Cloud Service. This performance data includes the minimal set supported for Azure Virtual Machines but also a Verbose set specified by the diagnostics.wadcfg file also used to configure Azure Diagnostics.

Posted in Azure, Cloud Services, Monitoring Service API, Virtual Machines | Tagged , , | Leave a comment

Using Azure Monitoring Service API with Azure Virtual Machines

Microsoft Azure Virtual Machines is the IaaS feature in the Azure platform that provides the ability to easily deploy Linux and Windows Server VMs into a public cloud infrastructure. Like other public clouds, Azure uses commodity hardware with horizontal scaling being used to provide scalable applications. As with any scalable system it is essential that monitoring, alerting and scaling be automated.

The Azure Portal provides a graphical view of various metrics for Virtual Machines, and a new preview Microsoft Azure Monitoring Services API provides direct access to the data displayed there. This data can be used to:

  • Build monitoring dashboards
  • Provide alerts
  • Implement autoscaling

Azure provides these capabilities out of the box, but the Monitoring Services API can be used to provide more sophisticated or proprietary functionality.

The Monitoring Services API is built on top of the Azure Service Management REST API, which also provides the core API for managing Azure compute services. The Monitoring Services API is documented on MSDN. Stephen Siciliano (@icsus) does a great overview of the API on the Cloud Cover show on Channel 9.

Monitoring

The Monitoring Services API provides access to various service metrics persisted by Azure. For Virtual Machines the following metrics are captured:

Name Units Reporting
Disk Read Bytes/sec Bytes / sec Max, Min, Ave
Disk Write Bytes/sec Bytes / sec Max, Min, Ave
Network Out Bytes Total
Network In Bytes Total
Percentage CPU Percentage Max, Min, Ave

These metrics are the same as those displayed on the Azure Portal. They are all available at the host hyper-visor level so consequently do not require the deployment of an agent into the VM. For Virtual Machines, metrics data is captured every 30 seconds and up to 90 days of data is made available. The data is reported in time grains that are 5, 60 or 720 minutes (12 hours) long. For 5 minute time grains, the data is averaged over 30 second intervals. For 60 and 720 minute buckets, the data is averaged over 5 minute intervals.

Microsoft Azure Monitoring Services API

The Monitoring Services API is available as a NuGet download. It can be installed using the following command in the Package Manager console:

Install-Package Microsoft.WindowsAzure.Management.Monitoring –Pre

It provides classes providing the ability to:

  • Download metrics
  • Configure alerting based on these metrics
  • Configure autoscaling based on the metrics

The Monitoring Services API provides monitoring support for the following Azure services:

  • Cloud Services (PaaS)
  • HdInsight (Hadoop on Azure)
  • Mobile Service
  • Service Bus
  • Service Bus Brokered Messaging
  • Storage
  • Virtual Machines (IaaS)
  • Web Site

These services are referred to as resource types in the Monitoring Services API, and are identified by resource Id. This post is focused on the Monitoring Services API support for Azure Virtual Machines.

Metric Definitions

The MetricsClient class provides the core connection to the Azure Service Management API. Authentication is provided using a self-signed X.509 Service Management certificate that has previously been uploaded as a Management Certificate to Azure (or alternatively downloaded as a publish settings file.

The MetricsClient class exposes a MetricDefinitions property which is used to access the metric definitions for a specified resource Id. A List() method can be invoked on this property to download a collection of MetricDefinition entities, each of which provides a single metric definition for the specified resource Id.

For Virtual Machines the resource Id is a String of the form:

/hostedservices/SERVICE_NAME/deployments/DEPLOYMENT_NAME/roles/VM_NAME

This information is available on the Azure Portal. The SERVICE_NAME is the name of the cloud service container hosting the VM. The DEPLOYMENT_NAME identifies the specific deployment of the VM, and VM_NAME is the name of the VM. This information is all available on the Azure Portal.

The ResourceIdBuilder class is a helper class exposing methods to create correctly-formatted resource Ids for the various types of resources (Virtual Machines, Cloud Services, Web Sites, etc.). For example, the BuildVirtualMachineResourceId() method creates a resource Id for Virtual Machines.

The MetricDefinition class is declared:

public class MetricDefinition {
public MetricDefinition();

  public String DisplayName { get; set; }
public Boolean IsAlertable { get; set; }
public IList<MetricAvailability> MetricAvailabilities { get; set; }
public TimeSpan MinimumAlertableTimeWindow { get; set; }
public String Name { get; set; }
public String Namespace { get; set; }
public String PrimaryAggregation { get; set; }
public String ResourceIdSuffix { get; set; }
public String Unit { get; set; }
}

DisplayName is a human-readable name while Name identifies the metric in API calls. IsAlertable specifies whether the metric can be used to trigger alerts, which is always true for Virtual Machines metrics. MetricAvailabilities specifies the time grain for metrics (5 minutes, 60 minutes and 720 minutes for Virtual Machines). MinimumAlertableTimeWindows specifies the minimum time windows for alerts, and is always 5 minutes for Virtual Machines metrics. Namespace provides additional parameterization of the resource for some resource types, but is not used for Virtual Machines metrics. PrimaryAggregation specifies whether the metric is specified as a Total or as an Average (with associated minimum and maximum) value. ResourceIdSuffix specifies the suffix identifying the actual resource for a given resource (roles/VM_NAME for a Virtual Machines metric). Unit specifies the units for the metric – for Virtual Machines metrics this is one of Bytes, Bytes / second or Percentage).

The following example shows how to retrieve the metrics definitions for Virtual Machines:

X509Store store = new X509Store(
StoreName.My, StoreLocation.CurrentUser);

store.Open(OpenFlags.ReadOnly);

X509Certificate2 x509Certificate = store.Certificates.Find(
X509FindType.FindByThumbprint, thumbprint, false)[0];

CertificateCloudCredentials credentials =
new CertificateCloudCredentials(SUBSCRIPTION_ID, x509Certificate);

String vmResourceId = ResourceIdBuilder.BuildVirtualMachineResourceId(
CLOUD_SERVICE_NAME, DEPLOYMENT_NAME, VM_NAME);

MetricsClient metricsClient = new MetricsClient(credentials);

MetricDefinitionListResponse metricListResponse =
metricsClient.MetricDefinitions.List(vmResourceId, null, null);

foreach (MetricDefinition metricDefinition in
metricListResponse.MetricDefinitionCollection.Value)
{
String displayName = metricDefinition.DisplayName;
String metricName = metricDefinition.Name;
TimeSpan alertableTimeWindows =
metricDefinition.MinimumAlertableTimeWindow;
String units = metricDefinition.Unit;
String primaryAggregation = metricDefinition.PrimaryAggregation;
}

The following shows the metric definition for Disk Read Bytes/sec:

  • Name: “Disk Read Bytes/sec”
  • Namespace: “”
  • ResourceIdSuffix: “roles/VM_NAME”
  • DisplayName: “Disk Read Bytes/sec”
  • Unit: “Bytes/sec”
  • PrimaryAggregation: “Average”
  • MetricAvailabilities: [
  •   {
  •     TimeGrain: “PT5M”
  •     Retention: “P90D”
  •   }
  •   {
  •     TimeGrain: “PT1H”
  •     Retention: “P90D”
  •   }
  •   {
  •     TimeGrain: “PT12H”
  •     Retention: “P90D”
  •   }
  • ]
  • MinimumAlertableTimeWindow: “PT5M”
  • IsAlertable: true

Metric Values

The MetricsClient class exposes a MetricValues property which is used to access metric values for a specified resource Id. A List() method can be invoked on this property to download a collection of MetricValue entities, each of which provides a single metric value definition for the specified resource Id and metric type.

The List() method has the following parameters:

  • resourceId
  • comma-separated list naming the metrics to be retrieved
  • time grain – must be one of 5 minutes, 60 minutes or 720 minutes (12 hours)
  • start time for the data to be retrieved
  • end time for the data to be retrieved

The List() method returns a MetricValueListResponse entity exposing a collection of MetricValueSet, with an entry for each type of metric value retrieved. MetricValueSet is declared:

public class MetricValueSet {
public MetricValueSet();

  public String DisplayName { get; set; }
public DateTime EndTime { get; set; }
public IList<MetricValue> MetricValues { get; set; }
public String Name { get; set; }
public String Namespace { get; set; }
public String PrimaryAggregation { get; set; }
public DateTime StartTime { get; set; }
public TimeSpan TimeGrain { get; set; }
public String Unit { get; set; }
}

DisplayName is a human-readable name while Name identifies the metric in other API calls. StartTime and EndTime indicate the start and end of the time interval containing the metric values. MetricValues is a list of the MetricValue entities containing the actual metric data. Namespace provides additional parameterization of the resource for some resource types, but is not used for Virtual Machines metrics. PrimaryAggregation specifies whether the metric is specified as a Total or as an Average (with associated minimum and maximum) value. TimeGrain indicates the interval between data points in the MetricValues collection, and for Virtual Machines metrics must be one of 5 minutes, 60 minutes or 720 minutes (12 hours). Unit specifies the units for the metric – for Virtual Machines metrics this is one of Bytes, Bytes / sec and Percentage).

MetricValue is declared:

public class MetricValue {
public MetricValue();

  public String Annotation { get; set; }
public Nullable<Double> Average { get; set; }
public Nullable<Int32> Count { get; set; }
public Nullable<Double> Maximum { get; set; }
public Nullable<Double> Minimum { get; set; }
public DateTime Timestamp { get; set; }
public Nullable<Double> Total { get; set; }
}

Annotation is not used for Virtual Machines metrics. Average, Maximum and Minimum specify the relevant value for a metric of type Average. Count indicates the number of individual data points in the average. These data points are every 30 seconds for metrics retrieved with a time grain of 5 minutes and 60 minutes, and every 30 minutes for metrics retrieved with a time grain of 720 minutes. Timestamp is the start time of the time grain represented by this metric value. Total specifies the value for a metric of type Total.

The following example shows the retrieval of all the Virtual Machines metrics for a VM for the preceding day, with the data reported at hourly intervals (on the hour):

String vmResourceId = ResourceIdBuilder.BuildVirtualMachineResourceId(
CLOUD_SERVICE_NAME, DEPLOYMENT_NAME, VM_NAME);

MetricsClient metricsClient = new MetricsClient(credentials);

List<String> metricNames = new List<String>() { “Disk Read Bytes/sec”,
“Disk Write Bytes/sec”, “Network In”, “Network Out”,
“Percentage CPU” };

// timeGrain must be 5, 60 or 720 minutes.
TimeSpan timeGrain = TimeSpan.FromMinutes(60);
DateTime startTime = DateTime.UtcNow.AddDays(-1);
DateTime endTime = DateTime.UtcNow;

MetricValueListResponse response = metricsClient.MetricValues.List(
vmResourceId, metricNames, String.Empty, timeGrain, startTime,
endTime);

foreach (MetricValueSet value in
response.MetricValueSetCollection.Value)
{
String valueName = value.Name;
foreach (MetricValue metricValue in value.MetricValues) {
Double? average = metricValue.Average;
Int32? count = metricValue.Count;
Double? maximum = metricValue.Maximum;
Double? minimum = metricValue.Minimum;
DateTime timestamp = metricValue.Timestamp;
Double? total = metricValue.Total;
}
}

The following shows a Network In metric value for a five minute interval:

  • Timestamp: “2014-06-21T19:45:00Z”
  • Average: 36713
  • Minimum: 36713
  • Maximum: 36713
  • Total: 36713
  • Count: 1

The following shows a CPU Percentage metric value for a five minute interval:

  • Timestamp: “2014-06-21T19:55:00Z”
  • Average: 4.105362
  • Minimum: 2.862055
  • Maximum: 10.254867
  • Total: 41.05362
  • Count: 10

Azure Service Management REST API

The Azure Service Management REST API is the core API for managing Azure compute resources. The Monitoring Services API invokes the REST API to interact with Azure. The Service Management REST API uses an X.509 certificate for authentication.

The host for the Service Management API is:

The Monitoring Services operations requires the use of a Service Management API version as specified in the following request headers:

  • x-ms-version: 2013-10-01

By default, the data is returned in XML format. However, the following request header indicates it should instead be returned as JSON:

  • Accept: application/json

The Service Management API uses different paths for different operations.

The path for the Get Metric Definitions operation is:

/SUBSCRIPTION_ID/services/monitoring/metricdefinitions/query?resourceId=/hostedservices/CLOUD_SERVICE_NAME/deployments/DEPLOYMENT_NAME/roles/VM_NAME

The path for the Get Metrics operation is:

/SUBSCRIPTION_ID/services/monitoring/metricvalues/query?resourceId=/hostedservices/CLOUD_SERVICE_NAME/deployments/DEPLOYMENT_NAME/roles/VM_NAME &namespace=&names=METRICS_LIST&timeGrain=TIME_GRAIN&startTime=START_TIME&endTime=END_TIME

METRICS_LIST is a comma-separated list of the metrics to be retrieved. START_TIME and END_TIME are the start and end times of the retrieval interval, specified as UTC date times in ISO 8601 format. The times should have both a T and Z indicators. For example: 2014-06-21T19:00:00Z. TIME_GRAIN specifies the time grain as an ISO 8601 time interval:

  • PT5M – 5 minutes
  • PT1H – 60 minutes
  • PT12H – 720 minutes

Note that although the alerts and autoscaling functionality in the Service Management REST API is documented, the monitoring functionality is currently not documented.

Summary

The Azure Monitoring Services API is a preview API that supports access to monitoring data captured by various Azure services. For Azure Virtual Machines it provides access to core performance data for a VM including: CPU Percentage; Network In and Network Out; Bytes Read / sec and Bytes Written / sec. This data can be displayed in a dashboard. The Monitoring Services API also provides the ability to configure alerting or autoscaling based on the monitored data.

Posted in Azure, Monitoring Service API, Virtual Machines | Tagged , , | 12 Comments

Network Connectivity in Azure

Microsoft Azure is a general purpose, public cloud that provides compute, storage, connectivity and services. The pace of innovation in each of these areas is accelerating, making it harder (in a good way) to keep abreast of the latest developments. The last few months has brought significant enhancements to the connectivity feature set for Azure. Indeed, in its most recent Magic Quadrant reports Gartner made “Microsoft the only public cloud vendor to be named a Leader for both PaaS and IaaS.” This post is a brief overview of the current state of network connectivity for Azure VNETs and cloud services – with current meaning early June 2014.

Cloud Service

A cloud service is the organizational container into which Azure compute instances are deployed. On creation, a cloud service is permanently associated with a DNS name of the form myCloudServiceName.cloudapp.net and a location which is one of: region, affinity group, or VNET.

Geographies and Region

Microsoft has deployed Azure into datacenters across the globe. These datacenters are not directly exposed to customers. Instead, customers deploy applications to regions each of which may encompass more than one underlying datacenter. Azure currently provides the following regions:

  • East US
  • West US
  • North Central US
  • South Central US
  • Brazil South
  • North Europe
  • West Europe
  • East Asia
  • Southeast Asia
  • Japan East
  • Japan West
  • China North (via 21Vianet)
  • China South (via 21Vianet)

Affinity Groups

An affinity group is a named part of a region into which related compute and storage services can be deployed. Historically, this co-location lowered the latency between compute and storage. The introduction of the high-speed Generation 2 network in 2012 meant that deploying compute and storage into an affinity group no longer provides a latency advantage. Furthermore, the use of an affinity group added complexity since there was no easy way to migrate either compute or storage from one affinity group to another. One limitation is that deployments in an affinity group could not access new compute features – such as high-CPU compute instances – not provided to that affinity group. Access to the new compute features could require the creation of a new affinity group followed by migration of cloud services into the affinity group.

Affinity Group VNET

The first version of Azure VNETs was built on top of affinity groups in that a VNET had to be created in an affinity group. This means that Affinity Group VNETs are subject to the same constraint with regard to new compute features that the underlying affinity group exhibits.

An Affinity Group VNET can host both PaaS and IaaS cloud services, and provides the following features to them:

  • Azure Load Balancer
  • Static IP Addresses
  • VPN Gateway

These are described later in the post.

Regional VNET

Microsoft introduced the Regional VNET at Tech Ed NA 2014. As its name indicates, a Regional VNET is associated with a region and provides access to any of the cloud service compute features provided in a region. Many of the new connectivity features of Azure work only in a Regional VNET and are not available in Affinity Group VNETs. It is not possible to convert an Affinity Group VNET into a Regional VNET. At some point all existing VNETs will be upgraded to be Regional VNETs.

A Regional VNET can host both PaaS and IaaS cloud services, and provides the following features to them:

  • Azure Load Balancer
  • Internal Load Balancer
  • Reserved IP Addresses
  • Instance-Level Public IP Addresses
  • Static IP Addresses
  • VPN Gateway

Currently, Regional VNETs cannot be created directly in the Azure Portal. However, a Regional VNET can be created in the portal by uploading an appropriate network configuration file. The schema for this file is identical to that of a traditional Affinity Group VNET with the exception that the AffinityGroup attribute naming the affinity group is replaced with a Location attribute specifying the region to host the VNET. For example, the following network configuration can be imported to create a Regional VNET with three subnets:

<NetworkConfiguration xmlns:xsd=http://www.w3.org/2001/XMLSchema
   xmlns:xsi=http://www.w3.org/2001/XMLSchema-instance
  xmlns=”http://schemas.microsoft.com/ServiceHosting/2011/07/NetworkConfiguration”&gt;
   <VirtualNetworkConfiguration>
     <Dns />
     <VirtualNetworkSites>
       <VirtualNetworkSite name=”AtlanticVNET” Location=”East US”>
        
<AddressSpace>
          
<AddressPrefix>10.0.0.0/8</AddressPrefix>
        
</AddressSpace>
        
<Subnets>
          
<Subnet name=”FrontEnd”>
            
<AddressPrefix>10.0.0.0/16</AddressPrefix>
          
</Subnet>
          
<Subnet name=”BackEnd”>
            
<AddressPrefix>10.1.0.0/16</AddressPrefix>
          
</Subnet>
          
<Subnet name=”Static”>
            
<AddressPrefix>10.2.0.0/16</AddressPrefix>
          
</Subnet>
        
</Subnets>
      
</VirtualNetworkSite>
    
</VirtualNetworkSites>
  
</VirtualNetworkConfiguration>
</NetworkConfiguration>

The Get-AzureVNETConfig PowerShell cmdlet can be used to download the current network configuration for a subscription. This file can be modified and a new configuration uploaded using the Set-AzureVNETConfig PowerShell cmdlet. Note that there is a single network configuration file in a subscription containing the definition of all VNETs created in that subscription.

VIP for Cloud Services

A cloud service is permanently associated with a DNS name of the form myCloudServiceName.cloudapp.net. However, a single public VIP address is associated with the cloud service only while there is an IaaS VM or PaaS instance deployed into it. This VIP does not change as long as a VM or instance is deployed into the cloud service. The VIP is lost when the last VM or instance in the cloud service is deleted. This means that an A Record can be used to map a vanity URL (e.g., mydomain.com) to a cloud service VIP as long as care is taken never to completely delete all the VMs or instances in the cloud service. A CNAME can always be used to map a vanity URL to the domain URL permanently associated with the cloud service.

Reserved VIPs for Cloud Services

Azure now supports the ability to reserve a public VIP for a subscription. The address is issued by Azure and it is not possible to provide an IP address. A reserved IP address is associated with a single region. A reserved IP can be configured to be the VIP for both IaaS and PaaS cloud services. Reserved IP addresses are a billable feature. Note that Reserved IP addresses can be used with cloud services deployed into a Regional VNET or a region but not with cloud services hosted in an Affinity Group VNET. There is a soft limit of 5 reserved IP addresses per subscription, but this limit can be increased on request.

Currently it is not possible to configure a Reserved IP through the Azure Portal. Using PowerShell, an IP address reservation can be requested for and removed from a subscription as follows:

New-AzureReservedIP –ReservedIPName “anIPName” –Label “aLabel”
 
–Location “West US”

Remove-AzureReservedIP –ReservedIPName “anIPName” –Force

The Reserved IP addresses in a subscription can be retrieved using the Get-AzureReservedIP PowerShell cmdlet.

Currently, a Reserved IP address can be associated with an IaaS cloud service only when a new deployment is made (i.e., the first VM is deployed into it). A Reserved IP address is associated with an IaaS cloud service using the New-AzureVM PowerShell cmdlet, as follows:

$vmConfiguration | New-AzureVM –ServiceName “anIaaSService”
  –ReservedIPName “anIPName” –Location “West US”

A Reserved IP address is associated with a PaaS cloud service by adding a ReservedIPs tag to the AddressAssignments section of the service configuration for the service:

<ServiceConfiguration serviceName=”aCloudService”>
 
<Role> … </Role>
 
<NetworkConfiguration>
   
<AddressAssignments>
     
<ReservedIPs>
       
<ReservedIP name=”anIPName”/>
     
</ReservedIPs>
   
</AddressAssignments>
 
</NetworkConfiguration>
</ServiceConfiguration>

Azure Load Balancer

The VIPs associated with a cloud service are hosted by the Azure Load Balancer which analyses traffic arriving at the VIP and then forwards traffic to the appropriate VM depending on the endpoint declarations for the VMs in the cloud service. The Azure Load Balancer only forwards TCP and UDP traffic to the VMs in a cloud service. Note that this means that it is not possible to ping an Azure VM through the Azure Load Balancer. The Azure Load Balancer supports port forwarding and hash-based load-balancing for both PaaS and IaaS cloud services. The official Azure Team Blog has a post with an extensive discussion of the Azure Load Balancer.

In Port Forwarding, the Azure Load Balancer exposes different ports for the same service on different VMs. It then forwards traffic received on a specific port to the appropriate VM. This is used to expose services such as RDP and SSH, which need to target a specific VM.

In hash-based load balancing, the Azure Load Balancer makes a hash of source IP, source port, destination IP, destination port and protocol and uses the hash to select the destination VM. Hash-based load balancing is used to distribute traffic among a set of stateless VMs, any of which can provide the desired service – e.g., web servers. Note that hash-based load balancing is often erroneously described as “round robin.”

For a PaaS cloud service, the Azure Load Balancer is configured through the endpoint declaration for the roles in the Service Definition file. An input endpoint is used to configure hash-based load balancing. An instance input endpoint is used to configure port forwarding. For IaaS cloud services, the hash-based load balancer is configured through the addition of VMs to a load balanced set while port forwarding is configured directly.

The Azure Load Balancer routes traffic only to VMs it identifies as healthy. It uses a custom health probe to periodically ping the VMs it routes traffic to and through the response or lack thereof identifies their health status. Both IaaS and PaaS cloud services support custom health probes. If a custom health prove is not provided for a PaaS cloud services, the Azure Load Balancer pings the Azure Agent on the instance which responds with a healthy state while the instance is in the Ready state.

Custom health probes are configured by providing either a TCP or HTTP endpoint. The Azure Load Balancer pings the endpoint and identifies a healthy state by either a TCP ACK or an HTTP 200 OK. By default, the ping occurs every 15 seconds and a VM is deemed unhealthy if an appropriate response is not received for 31 seconds. An application is free to provide its own algorithm when deciding how to respond to the ping.

Instance-Level Public IP Addresses

Each cloud service containing a deployment has a single VIP associated with it. Azure also supports the association of a public IP address with individual VMs and instances of cloud services deployed to a Regional VNET. An Instance-Level Public IP Address (PIP) can be used to support services like passive FTP which require the opening of large port ranges which is not possible with a cloud service VIP. Currently, there is a soft limit of two PIPs per subscription.

Traffic directed to a PIP does not go through the standard Azure Load Balancer and is instead forwarded directly to the VM. This means that the VM is exposed to the internet so care should be taken that a firewall is configured appropriately. The Azure Load Balancer permits only TCP and UDP traffic to reach a VM. However, ICMP traffic (i.e., ping) can be sent successfully to a VM with an assigned PIP.

A PIP is associated with an IaaS VM using the Set-AzurePublicIP PowerShell cmdlet. This modifies a VM configuration which can then be used with New-AzureVM or Update-AzureVM to create or update a VM respectively. The Remove-AzurePublicIP is used to remove a PIP from a VM configuration which must then be applied to the VM with Update-AzureVM.

A PIP is associated with a PaaS cloud service by adding a PublicIPs tag to the AddressAssignments section of the service configuration for the service:

<ServiceConfiguration serviceName=”aCloudService”>
 
<Role> … </Role>
 
<NetworkConfiguration>
   
<AddressAssignments>
     
<PublicIPs>
       
<PublicIP name=”aPublicIPName”/>
     
</PublicIPs>
   
</AddressAssignments>
  
</NetworkConfiguration>
</ServiceConfiguration>

The actual PIPs assigned to a VMs in a cloud service are retrieved using the Get-AzureRole PowerShell cmdlet, as follows:

Get-AzureRole -ServiceName “aCloudService” -Slot Production
  –InstanceDetails

DIPs for Cloud Service VMs and Instances

Azure automatically assigns dynamic IP addresses (DIP) to VMs in IaaS and PaaS cloud services. A DIP remains associated with the VM as long as it is allocated. When an IaaS VM is de-allocated the DIP is given up and may be allocated to a new VM.

The DIP allocated to a VM depends on whether or not it resides in a VNET. If the VM is not in a VNET then the DIP is allocated from a DIP range managed by Azure. If the VM is in a VNET then the DIP is allocated (sequentially) from the DIP range configured for the subnet in which it is deployed. The DIP is allocated through a special type of DHCP which provides an essentially infinite lease on it.

A VM in a PaaS cloud service keeps the DIP for as long as it is deployed. A VM in an IaaS cloud service keeps the DIP while it is allocated, but it loses it when it is de-allocated. In this case the DIP may be allocated to a different VM and the original VM may get a different DIP when it is once again allocated. Note that the VM preserves the DIP even if it is migrated to a new physical server as part of a server-healing operation.

It is crucial that no change is made to the NIC configuration on a VM. Any such changes is lost if the VM is ever redeployed as part of a server-healing operation.

Static IPs for IaaS Cloud Service VMs

Azure supports the allocation of static IPs to VMs deployed to an IaaS cloud service in a VNET. This is useful for VMs providing services such as Domain Controller and DNS. The general guidance is that static IP addresses should be used only when specifically needed.

When both IaaS and PaaS cloud services are deployed in a VNET care must be taken that there is no chance of overlap between the IP addresses allocated to PaaS instances and static IP addresses used in the IaaS cloud service. Doing so creates the possibility that a PaaS instance is allocated a DIP that is associated with a currently de-allocated IaaS VM. This is best managed by allocating the static IP address from a subnet containing only static IP addresses.

A static IP address is associated with an IaaS VM using the Set-AzureStaticVNetIP PowerShell cmdlet. This modifies a VM configuration which can then be used with New-AzureVM or Update-AzureVM to create or update a VM respectively. The Remove-AzureStaticVNetIP is used to remove an Instance-Level Public IP Address from a VM configuration which must then be applied to the VM with Update-AzureVM. The Test-AzureStaticVNetIP PowerShell cmdlet can be used to check whether a specific IP address is currently in use.

Endpoints

An endpoint in Azure refers to the combination of a port number and protocol that is configured for a role in a PaaS cloud service or a VM in an IaaS cloud service.

Three types of endpoint are configurable for a role in a PaaS cloud service:

  • Input endpoint
  • Instance input endpoint
  • Internal endpoint

Both types of input endpoint are exposed through the Azure Load Balancer. An input endpoint is used for hash-based load balancing while an instance input endpoint is used for port forwarding. An internal endpoint exposes a port addressable by other VMs in the cloud service. Internal endpoints provide discoverability through the Azure Runtime API of both the VM DIP and the port actually used on individual VMs. If permitted by the firewall any VM in a VNET can connect to any other VM in the VNET, regardless of cloud service, as long as the actual DIP and port number to connect are known.

The Azure Load Balancer exposes two types of endpoint for a VM in an IaaS cloud service:

  • Load-balanced availability set
  • Port forwarding

A load-balanced availability set is used to configure hash-based load balancing while a port forwarded endpoint does just that. Both types of endpoint can be configured at any time.

The Azure Load Balancer routes traffic to the VMs or instances in a cloud service if an appropriate endpoint has been declared. Note that traffic sent directly to a PIP bypasses the load balancer completely. The Azure Load Balancer provides an ACL capability which further restricts the traffic routed to VMs and instances. The ACL feature allows a set of accept and deny rules to be configured for an endpoint, and the Azure Load Balancer uses these rules to make a decision as to whether or not inbound traffic should be routed to VMs and instances. For example, a rule can be configured to only route traffic coming from a single IP address – for example the public IP address of a company.

Internal Load Balancer

Azure supports the creation of an internal load balancer, which exposes an internal endpoint through which traffic can be routed to one or more VMs in the same VNET or cloud service. Currently, Azure supports internal load balancing only for IaaS cloud services in a new cloud service or Regional VNET. An internal load balancer is created inside a cloud service.

The following PowerShell cmdlets are used to manage internal load balancers:

  • Add-AzureInternalLoadBalancer
  • Get-AzureInternalLoadBalancer
  • Remove-AzureInternalLoadBalancer

Add-AzureInternalLoadBalancer is used to create an internal load balancer inside a cloud service. The internal load balancer is identified by name. Get-AzureInternalLoadBalancer retrieves the name and DIP of the internal load balancer in a cloud service. An internal load balancer which is not currently being used can be deleted using Remove-AzureInternalLoadBalancer.

Load balanced endpoints are created on the VMs to be load balanced by using Set-AzureEndpoint in combination with New-AzureVM or Update-AzureVM depending on whether the VM already exists. Set-AzureEndpoint has two parameters to manage the internal load balancer: –InternalLoadBalancerName which specifies the internal load balancer to use; and –LBSetName which provides a name to identify the set of VMs to be load balanced.

VPN

Azure supports two types of VPN:

S2S is enterprise focused and uses a hardware or software VPN router on the client side to share the connection among many users. Azure provides router configuration scripts for many standard Cisco and Juniper VPN Routers.

P2S is developer focused and uses the RAS software that comes with Windows. User authentication is provided through X-509 certificates. A master X.509 certificate is uploaded to the Azure Portal and is then used to create user-specific X.509 certificates which are then distributed individually to each user. P2S is therefore easy to use for small trusted development teams but does not scale well because of the inability to revoke the user certificates.

Both types of VPN connect to a VPN Gateway configured inside an Azure VNET. This gateway is fully managed by Azure and is deployed in an HA manner. Traffic can be routed in either direction across the VPN making it possible to do things like connect a front-end PaaS cloud service to an on-premises SQL Server. Another important use of the VPN is the ability to perform all system administration over the VPN without the need for SSH and RDP endpoints.

It is possible to configure a VPN between two Regional VNETs hosted in different regions. This is done by creating a VPN Gateway in each region and cross-referencing the address ranges for each VNET. VPN traffic going across such a VPN is routed across the Microsoft Azure backbone rather than be routed across the public internet.

ExpressRoute

Microsoft has worked with various network partners to provide a direct private connection into Azure, in a feature named ExpressRoute. This comes in two flavors:

  • Exchange Provider
  • Network Service Provider

Exchange Provider is offered by hosting companies such as Equinix and Level 3 in which they provide their customers with direct connections into Azure. Alternatively, network service providers such as AT&T and BT provide MPLS connectivity into Azure.

ExpressRoute provides direct connectivity to an Azure VPN Gateway and then to the VNET hosting the gateway and any cloud services hosted in the VNET. It also provides access to Azure Storage, but not to other Azure services such as Azure SQL Database. ExpressRoute provides speeds up to 1Gbps during preview with this limit increased to 10Gbps when it goes GA.

DNS

Azure provides name resolution services for VMs in IaaS and PaaS cloud services. It also provides name resolution for up to 100 VMs in a VNET, provided their fully-qualified domain name is used. Otherwise, a DNS server must be provided if name resolution services are needed. This is specifically the case in hybrid deployments where on-premises servers must contact Azure VMs over a VPN. The DNS server is configured in the network configuration file for the subscription. A DNS server should be deployed with a static DIP.

Azure Traffic Manager

The Azure Traffic Manager provides global traffic routing to destinations both in Azure and elsewhere. It works through the dynamic mapping of short-lived DNS entries to actual IP addresses. Once a Traffic Manager profile is configured it provides a DNS name, such as mydomain.trafficmanager.net, to the application which is mapped dynamically to an appropriate DNS entry for the actual cloud service to be used. The Azure Traffic Manager can be used with to route traffic to websites outside Azure.

Traffic Manager provides the following load-balancing choices:

  • Performance
  • Round Robin
  • Failover

The Performance choice indicates that the application is deployed as distinct cloud services in multiple geographic locations, such as every Azure region, and that the user accessing the application should be automatically redirected to the cloud service with the lowest latency. Internally, Traffic Manager is provided with latency tables for different routes across the internet and uses these tables in choosing where to route the user.

The Round Robin choice indicates that as new users should be allocated to the underlying cloud services in a round-robin manner.

The Failover choice indicates that there is a primary cloud service and one or more passive secondary cloud services. All traffic is sent to the primary cloud service and when it fails the traffic is sent to one of the configured secondary cloud service instead. The Traffic Manager detects the health of the underlying cloud service by performing a ping every 30 seconds against a configured URL hosted by the cloud service. If the Traffic Manager fails to receive a response more than twice it marks the cloud service as unhealthy and starts routing traffic to the secondary cloud service.

Summary

The new Regional VNET capability of Azure has allowed the provision of a wide variety of network services such as internal load balancers, instance-level public IP addresses, and VNET-to-VNET VPN capability. This post provided a brief summary of these features and described how they fit into the existing network capabilities of the Azure Platform.

Posted in Azure, Cloud Services, Virtual Machines, VNET | Tagged , , , | 2 Comments

Disk Storage on Linux VMs in Azure

Microsoft Azure supports the ability to mount VHDs into Azure Virtual Machines running any of the supported Linux distributions. Additionally, the Azure File Service provides a managed logical file system that can be mounted into Ubuntu distributions using the SMB protocol. This post focuses on mounting VHDs into Azure VMs, but also shows how to mount a file system managed by the Azure File Service.

Page Blobs and VHDs

The Azure Blob Service provides two types of blobs: block blobs and page blobs. A block blob contains a single file intended to be read sequentially from start to finish – for example, an image used in a web page. A page blob provides up to 1TB of random-access storage, with the primary use case being as the backing store for a VHD. The Blob Service supports high availability for blobs by storing three copies of each blob in the local region, with the option of storing an additional three copies in a paired remote region. The local copies are updated synchronously while any remote copies are updated asynchronously with a delay of the order of a few minutes. Reads and writes are fully consistent.

An Azure storage account provides a security boundary for access to blobs. The account is specified by an account name that is unique across the entire storage service. Each storage account can comprise zero or more containers, each of which can contain zero or more blobs. There is an upper limit of 500TB of storage in a single account, and that provides the only limit on the number of containers and blobs. An Azure subscription has a soft limit of 20 storage accounts with a hard limit of 50 storage accounts.

Azure Disks

Azure Virtual Machines provides an IaaS compute feature that supports the creation of VMs with per-minute billing. It supports instance sizes varying from a shared core/512MB A0 instance to a 16 core/112GB A9 instance. It also supports the use of several Windows Server versions as well as various Linux distributions, including Ubuntu, CentOS, Suse, and Oracle Linux.

Azure Virtual Machines supports the following types of disk:

  • OS disk
  • Temporary disk
  • Data disk

The OS disk comprises a VHD that is attached to the VM. The VHD is stored as a page blob in Azure Storage so is accessed remotely. This use of Blob Storage means that the OS disk is durable and any flushed writes are persisted to Azure Storage. Consequently, there is no loss of data in the event of a failure of the physical server hosting the VM.

The temporary disk is in the chassis of the physical server hosting the VM, and is intended for use as swap space or other scratch purposes where the complete loss of the data would not be an issue. The contents of the temporary disk are not persisted to Azure Storage and are lost in the event of a server-healing induced migration of the VM. Consequently, the temporary disk should not be used to store data which cannot be recreated.

The data disk comprises a VHD that is attached to the VM. The VHD is stored as a page blob in Azure Storage so is accessed remotely. The number of data disks that can be attached to a VM depends on the instance type. The general rule is that two data disks can be attached to the VM for each CPU core in the instance, with the exception of a 16-core A9 for which only 16 disks may be attached.

A basic Azure VM has two disks initially – the OS disk and the temporary disk. Additional data disks can be attached and detached at any time, provided the disk number limits are adhered to. The disks used by Azure Virtual Machines are standard VHDs. These can be moved from one VM to another. They can also be migrated to and from other locations, such as on-premises. A new disk added to a VM must be partitioned and formatted in a manner appropriate to the OS of the VM to which the disk is attached.

Linux Disks

Azure Virtual Machines use the following arrangement of disks:

  • /dev/sda – OS disk
  • /dev/sdb – temporary disk
  • /dev/sdc – 1st data disk
  • /dev/sdr – 16th data disk

Performance

The OS disk and all data disks are durable with the data persisted as VHDs in page blobs hosted by the Azure Storage Service. This is a shared service which has implemented scalability targets to ensure fair access to shared resources. The scalability targets are documented on MSDN.

The scalability target for a single VHD is:

  • Maximum size: 1TB
  • Maximum requests / second: 500
  • Maximum throughput: 60 MB / second

There are additional scalability targets for a storage account. These vary by region and by the level of redundancy for the storage account. For locally redundant storage in US regions, the scalability target for a storage account are:

  • 20Gb/s ingress
  • 30Gbps egress

These scalability targets impact the use of VHDs in Azure Virtual Machines, since they affect the scalability of storage on a single VM. These targets indicate that there are performance limits of a single VHD and there are also performance limits on a number of VHDs in a single storage account. If all disks are accessed at their maximum throughput a single storage account could support something like 30-40 VHDs. It is crucial that appropriate testing be done on any data-intensive application that makes heavy use of attached disks to identify any performance problems so that remedial action can be taken.

The way to increase storage performance with Azure VMs is to increase the number of disks. For example, two 100GB disks have double the throughput of a single 200GB disk. As the number of VMs increases there arises the possibility of hitting the scalability targets for a storage account. The solution in that event is to use additional storage accounts. This can happen with more than two VMs in a data intensive application, where both VMs had 16 data disks working at full throughput.

Given these various targets it would appear that the optimal solution would be to store each VHD in its own storage account. However, this is unnecessary since the scalability target for a single storage account comfortably exceeds the performance requirements of a fully-loaded VM. Furthermore, doing so adds significantly to the administration overhead and would lead to an early adventure with another scalability target – the hard limit of 50 storage accounts per subscription.

Disk Caching

Azure supports various caching options for data disks:

  • None
  • Read only (write through)
  • Read/Write (write back)

By default, OS disks have read/write caching configured on creation while data disks have no cache configured on creation. The caching options can be specified either when the disk is initially attached to the VM or later. Note that only four disks on each VM can be configured for caching, which limits the utility of caching in larger VMs. It is important to test applications to identify whether caching data disks provides any performance improvements for the application workload.

Trim Support

The VHDs used by OS disks and data disks are persisted as page blobs in Azure Storage. Page blobs are implemented as sparse storage which means that only pages that have actually been written to are stored and billed for. For example, a 1TB page blob which has never been written to occupies no space in Azure Storage and consequently incurs no charges. Azure Storage supports the ability to clear pages no longer needed which means that they are no longer billed.

When a file is deleted in a normal file system the appropriate entries in the partition table are deleted but the underlying storage is not cleared. With a sparse storage system such as a VHD backed by an Azure page blob this means that when a file is deleted the actual pages allocated to the file remain written to and incurring charges.

SSDs are subject to a different phenomenon whereby the memory occupied by a deleted file must be cleared before it can be written to. SSDs support a TRIM capability which file systems can use to clear the memory occupied by deleted files.

TRIM has been implemented in Azure Virtual Machines so that when a file is deleted the space it occupied can be deleted from the underlying page blob. Since this has some performance implications this is a manual process that can be scheduled at a convenient time. TRIM support is a cost optimization not a performance optimization.

Ubuntu 14.04 images in the Azure Gallery support TRIM, which is referred to as discard when listed as a file system option. For example, the following command performs a TRIM operation on the Azure disk mounted on /mnt/data:

# sudo fstrim /mnt/data

TRIM is not provided on CentOS images in the Azure Gallery.

Configuring a Data Disk

Data disks can be attached to an Azure VM in various ways including:

Attaching a data disk exposes it to the VM as a raw iSCSI device that must be configured prior to use. As with any Linux system, this entails the following tasks:

  • Create partitions
  • Install file systems on the partitions
  • Mount the partitions into the file system

The following commands are used to perform these operations:

  • fdisk – manage and view the disk partitions
  • lsblk – view the partition and file system topology of disks
  • mkfs – put a file system onto a disk
  • mount – mount a file system
  • umount – unmount a file system

Partition a Disk

fdisk can be used to partition a disk as well as view information about all the disks on the VM.

The disk layout can be displayed using the following command:

# fdisk -l

The data disk located at /dev/sdc can be partitioned using the following command:

# fdisk –c –u /dev/sdc

The -c parameter turns off DOS-compatibility mode while the -u parameter causes partition sizes to be given in sectors instead of blocks. fdisk provides a wizard for which the following responses can be given to create a partition occupying an entire device:

  • n, p, 1, enter (default), enter (default), p, w.

This creates a new partition named /dev/sdc1 that occupies the whole of the /dev/sdc device.

Make and Mount a File System

The mkfs command is used to put a file system on a partition. The file systems supported by the current kernel are listed in the /proc/filesystems file. In the Linux VMs provided in the Azure Gallery, Ubuntu supports ext2, ext3 and ext4 while CentOS support ext4.

For example, the following command installs the ext4 file system on the /dev/sdc1 partition:

# mkfs -t ext4 -m 1 /dev/sdc1

The -m parameter reserves 1% of the disk for the super-user (down from the default of 5%).

The mount command is used to mount this file system into some mount point on the overall file system of the VM. This mount point is a directory. The following commands create a mount directory – /mnt/data – and then mounts a partition containing an ext4 file system into it:

# mkdir /mnt/data
# mount –t ext4 /dev/sdc1 /mnt/data

The mounted file system can now be used like any other file system on the VM. However, it is not automatically remounted when the VM is restarted. This can be achieved by putting an entry in the /etc/fstab file, which specifies the file systems that are to be mounted automatically on reboot. The /etc/fstab entry contains essentially the same information as used in the mount command. The partition to be mounted can be identified in various ways, including its location (e.g., /dev/sdc1) and the UUID that uniquely identifies it. Note that the UUID is unique across VMs, so using it helps avoid name collisions when disks are moved from one VM to another.

The lsblk command can be used to list the device, partition and file system topology, as well as relevant metadata – including the uniquely identifying UUID. For example, the following command lists a tree structure of the file system topology:

# lsblk ‒‒fs

The data issues by the lsblk command can be configured so that it can be used as the source of data for other commands. For example, the following command outputs only the UUID for the /dev/sdc1 partition:

# lsblk ‒‒noheadings ‒‒output UUID /dev/sdc1

The ‒‒output parameter indicates that only the UUID should be in the output and the ‒‒noheadings indicates that there should be no header for the column, so that nothing but the UUID is in the output (i.e. it is convenient for use in scripts).

The /etc/fstab file contains one line for each file system to be mounted, with each line comprising the following (whitespace or tab delimited) entries:

  1. Physical identification of the file system (e.g., UUID, /dev/sdc1)
  2. Mount point for the file system (e.g., /mnt/data)
  3. Type of file system (e.g., ext4)
  4. File system options (e.g., noatime)
  5. Dump designator (set to 0)
  6. File system check indicator (2 – at boot time, check the file system after checking the boot file system)

When the VM is booted the file systems listed in /etc/fstab are mounted automatically with the options, etc. specified in the file. The following is an example of an entry using the UUID to identify the partition:

UUID=602d265e-1918-4c29-b13b-7caecda395d8 /mnt/data ext4 defaults,nofail,noatime 0 0

In this example, an ext4 file system is mounted on /mnt/data, with the default mount options supplemented by the nofail and noatime options. nofail means that no error is reported if the physical device is not present at boot time, while noatime turns off the writing of the last read time of a file. Note that noatime implies nodiratime, an option often provided with noatime. The 0 indicates that the (obsolete) dump program will not dump the file while the final 0 indicates that the file system should not be checked at boot time.

The df command displays, in human-readable form, the available space on mounted file systems:

$ df –h

 

RAID Arrays of Multiple Disks

A single disk attached to a VM can be up to 1TB and provide 500 IOPS. If either more space or higher performance is needed then multiple disks must be attached to the VM. Depending on the instance size, between 2 and 16 disks can be attached to the VM. The disks can be treated either as just a bunch of disks (JBOD) or as a RAID device.

A JBOD is managed by replicating the single-disk process for all the disks – /dev/sdc, /dev/sdd, etc. However, using a JBOD to improve application performance only works if the application is able to distribute load (evenly) across all the disks. Furthermore, even if the application can spread the data among all the disks the manner in which the data is accessed can prevent the application gaining the performance benefits of the multiple disks.

mdadm can be used to create a software RAID device from a set of raw devices or partitions. It supports the creation of the following types of RAID arrays:

  • RAID0 (striped)
  • RAID1 (mirroring)
  • RAID4 (striping with dedicated parity)
  • RAID5 (striping with distributed parity)
  • RAID6 (striping with multiple distributed parity blocks)

The various RAID levels other than RAID0 provide data security in the event of a failure of a single device. However this is not important in Azure where the underlying storage system provides high availability for individual VHDs. Consequently, only RAID0 is needed for a disk array in Azure.

In planning the deployment of a data-intensive application to Azure Virtual Machines it is important to test the application to identify the optimal disk layout. If the application has not been developed specifically to be performant with a JBOD, it is likely that a RAID0 disk array provides better performance. Disk caching is turned off by default for data disks, but depending on the application the use of read or read-write caching may improve performance. However, only four disks attached to an Azure VM can have caching enabled which limits the utility of caching with disk arrays.

Configuring a Disk Array

The creation of a RAID0 disk array entails the following tasks:

  1. Install mdadm
  2. Create disk array
  3. Configure the disk array
  4. Create partitions
  5. Install file systems on the partitions
  6. Mount the partitions into the file system

Other than using the disk array name instead of a device name, steps 4 through 6 are the same as for a single disk.

mdadm is installed from the repository appropriate to the distribution.

CentOS:

# sudo yum install mdadm

Ubuntu:

#sudo apt-get install mdadm

The following command creates a RAID0 disk array named data on device /dev/md/data using two raw disk devices /dev/sdc and /dev/sdd:

# mdadm ‒‒create /dev/md/data ‒‒name=data ‒‒chunk=8 ‒‒level=0 ‒‒raid-devices=2 /dev/sdc /dev/sdd
mdadm: Defaulting to version 1.2 metadata
mdadm: array /dev/md/data started.

The default chunk size for a disk stripe is 512KB. The ‒‒chunk parameter in this example specifies that this disk array be created with 8 KiB chunks per disk.

Many examples specify the RAID0 device as /dev/md0 and do not provide a name. This can cause a name collision problem if the disk array is ever moved to a VM which already has a RAID device. Consequently, it is a good practice to take control of the RAID device name by specifying the ‒‒name parameter and using a device with the same name under /dev/md. When this is done mdadm automatically creates a link from /dev/md/name to /dev/mdN, where N is typically 127 but could be a sequentially lower number if the VM has more than one disk array. (See the comments by Doug Ledford on this page) Similar links are created for partitions created on the disk array. For example:

$ ls -l /dev/md*
brw-rw‒‒‒‒. 1 root disk 9, 127 May 25 02:17 /dev/md127
/dev/md:
total 4
lrwxrwxrwx. 1 root root 8 May 25 02:17 data -> ../md127
-rw‒‒‒‒‒‒‒. 1 root root 59 May 25 02:17 md-device-map

The details of all the arrays on the VM can be viewed as follows:

# mdadm ‒‒detail ‒‒verbose ‒‒scan
ARRAY /dev/md/data level=raid0 num-devices=2 metadata=1.2 name=snowpack-u3:data UUID=2bd91
d37:4bcc51cb:14a2913f:7d74dc0a
devices=/dev/sdc,/dev/sdd

In the example, the fully qualified RAID device name is provided as hostname:array-name, i.e., snowpack-u3:data. The UUID uniquely identifies the RAID0 disk array.

When invoked for a specific device, additional detail is provided. For example:

# mdadm ‒‒detail /dev/md/data
dev/md/data:
Version : 1.2
Creation Time : Sun May 25 19:56:35 2014
Raid Level : raid0
Array Size : 104857584 (100.00 GiB 107.37 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent
Update Time : Sun May 25 19:56:35 2014
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
Chunk Size : 8K
Name : snowpack-u3:data (local to host snowpack-u3)
UUID : 2bd91d37:4bcc51cb:14a2913f:7d74dc0a
Events : 0
Number Major Minor RaidDevice State
0 8 32 0 active sync /dev/sdc
1 8 48 1 active sync /dev/sdd

mdadm uses a config file, mdadm.conf, the location of which varies by distribution.

Centos: /etc/mdadm.conf

Ubuntu: /etc/mdadm/mdadm.conf

Once a disk array has been created, mdadm can be used to create the configuration file as follows (for Ubuntu):

# mdadm ‒‒detail ‒‒verbose ‒‒scan > /etc/mdadm/mdadm.conf

This file is used by mdadm to control the assembly of the disk array when the system is rebooted or restarted.

A specified disk array can be stopped as follows:

# mdadm ‒‒stop /dev/md/data

Once the disk array has been started, it appears in the list displayed by fdisk, as follows:

# fdisk -l

Disk /dev/md127: 107.4 GB, 107374166016 bytes
2 heads, 4 sectors/track, 26214396 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 8192 bytes / 16384 bytes
Disk identifier: 0x00000000

Additional mdadm Configuration for Ubuntu

These instructions work fine on CentOS. However, some additional steps are needed with Ubuntu to ensure the disk array is assembled correctly on reboot. The mdadm.conf configuration file must be added to the initramfs configuration used during the boot process.

The update-initramfs command is used to update initramfs, as follows:

# update-initramfs -u
update-initramfs: Generating /boot/initrd.img-3.13.0-24-generic

The following command displays the mdadm.conf configuration contained in initramfs so it can be used to verify that the initramfs update is successful:

$ gunzip -c /boot/initrd.img-3.13.0-24-generic | cpio -i ‒‒quiet ‒‒to-stdout etc/mdadm/mdadm.conf
ARRAY /dev/md/data level=raid0 num-devices=2 metadata=1.2 name=snowpack-u3:data UUID=8be17
1a8:817ef70d:bbac034e:fe77b402
devices=/dev/sdc,/dev/sdd

This should match the entry in /etc/mdadm/mdadm.conf.

Partition the Raid Device

The remaining process – partitioning the RAID0 device, creating a file system, and mounting the file system – proceeds exactly as for a single disk.

fdisk is used to partition a disk array. For example, create a partition on the /dev/md/data disk array as follows:

# fdisk -c -u /dev/md/data
Device contains neither a valid DOS partition table, nor Sun, SGI or OSF disklabel
Building a new DOS disklabel with disk identifier 0xc0586acc.
Changes will remain in memory only, until you decide to write them.
After that, of course, the previous content won’t be recoverable.
Warning: invalid flag 0x0000 of partition table 4 will be corrected by w(rite)
Command (m for help): n
Command action
e extended
p primary partition (1-4)
p
Partition number (1-4): 1
First sector (2048-209715167, default 2048):
Using default value 2048
Last sector, +sectors or +size{K,M,G} (2048-209715167, default 209715167):
Using default value 209715167
Command (m for help): p
Disk /dev/md/data: 107.4 GB, 107374166016 bytes
2 heads, 4 sectors/track, 26214396 cylinders, total 209715168 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 8192 bytes / 16384 bytes
Disk identifier: 0xc0586acc
Device Boot Start End Blocks Id System
/dev/md/data1 2048 209715167 104856560 83 Linux
Command (m for help): w
The partition table has been altered!
Calling ioctl() to re-read partition table.
Syncing disks.

Create a File System

mkfs is used to create a file system on a device or partition. For example, the following creates an ext4 file system on the partition /dev/md127p1:

# mkfs -t ext4 /dev/md127p1
mke2fs 1.42.9 (4-Feb-2014)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
Stride=2 blocks, Stripe width=4 blocks
6553600 inodes, 26214140 blocks
1310707 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
800 block groups
32768 blocks per group, 32768 fragments per group
8192 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872
Allocating group tables: done
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done

There used to be guidance that the file systems on RAID0 disk arrays be created with parameters indicating the size of the chunks. This is no longer necessary with ext4, as indicated by the stride and stripe values in the output – since these are precisely the values that would otherwise have been needed.

Mount a File System

Mount is used to mount a file system in a specified directory. For example the following creates a mount point directory, /mnt/data, and then mounts an ext4 file system hosted on /dev/md127p1 into the directory:

# mkdir /mnt/data
# mount -t ext4 /dev/md127p1 /mnt/data

Confirm the file system is mounted, as follows:

# sudo df –l -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 29G 1.1G 27G 4% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 826M 12K 826M 1% /dev
tmpfs 168M 396K 168M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 840M 0 840M 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/sdb1 69G 52M 66G 1% /mnt
/dev/md127p1 99G 60M 94G 1% /mnt/data

A /etc/fstab entry must be made to ensure that the file system is mounted when the VM reboots. The entry is the same as for a single disk, with the appropriate device information. The UUID for the RAID partition can be found using lsblk, as in:

# lsblk ‒‒output UUID /dev/md127p1
UUID
fb273c6e-93be-4d15-bc11-973f6d20148b

The following is an example of a /etc/fstab entry for the partition on the RAID disk array:

UUID=fb273c6e-93be-4d15-bc11-973f6d20148b /mnt/data ext4 defaults,nofail,noatime 0 0

Note that the entry must be on a single line.

This completes the process of:

  • creating an mdadm RAID0 disk array on two raw disk devices
  • creating a file system on the RAID0 disk array
  • mounting the file system

Azure File Service

Microsoft has released a preview of the Azure File Service, which provides a managed file system that is exposed securely through service endpoint as an SMB 2.1 share. The Azure File Service is built on the same technology as the other features of the Azure Storage Service. The logical file system can be mounted through the SMB share in an Azure VM and accessed just like any other file system. Furthermore, the SMB share can be mounted simultaneously into different VMs which allows for the sharing of files among different VMs. The Azure File Service therefore provides an alternative way to access durable storage from inside an Azure VM.

The Azure File Service has the following scalability targets:

  • Maximum size of a file share: 5TB
  • Maximum size of a single file: 1TB
  • Throughput (8KB operations): 1000 IOPS
  • Throughput: 60 MB/s per share

The Azure File Service can be used in the Ubuntu images and CentOS 7 images in the Azure Gallery. However, during the preview, the Azure File Service must be managed through PowerShell cmdlets. The cifs package provides an SMB client which can be installed as follows:

Ubuntu:

# apt-get install cifs-utils

CentOS 7:

# yum install cifs-utils

The standard mount command can be used to mount the SMB share into the VM file system. For example, with ACCOUNT_NAME and ACCESS_KEY being the Azure Storage account name and access key the following command mounts a share named SHARE into the specified directory:

# mount –t cifs //ACCOUNT_NAME.file.core.windows.net/SHARE /mnt/DIRECTORY -o vers=2.1,username=ACCOUNT_NAME,password=ACCESS_KEY,dir_mode=0777,file_mode=0777

Note that, similarly to the other file systems, an /etc/fstab entry is needed to ensure that the file system is mounted when the system reboots. For example, the following would be the equivalent (single line) /etc/fstab entry for the above example:

//ACCOUNT_NAME.file.core.windows.net/SHARE /mnt/DIRECTORY cifs vers=2.1,dir_mode=0777,file_mode=0777,username=ACCOUNT_NAME,password=ACCESS_KEY

An SMB share named my-share hosted in a storage account named ACCOUNT_NAME is displayed in a file system listing as follows:

# sudo df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 29G 1.2G 27G 5% /
none 4.0K 0 4.0K 0% /sys/fs/cgroup
udev 826M 12K 826M 1% /dev
tmpfs 168M 408K 168M 1% /run
none 5.0M 0 5.0M 0% /run/lock
none 840M 0 840M 0% /run/shm
none 100M 0 100M 0% /run/user
/dev/sdb1 69G 52M 66G 1% /mnt
/dev/md127p1 99G 60M 94G 1% /mnt/data
//ACCOUNT_NAME.file.core.windows.net/my-share 5.0T 3.7M 5.0T 1% /mnt/azure-files

Files

The following files and directories may be of use when working with disks, disk arrays and partitions:

  • /etc/fstab – provides file system configuration
  • /proc/filesystems – lists file systems that can be created on the VM
  • /proc/mdstat – lists raid arrays
  • /proc/mounts – lists mounted file systems
  • /dev/disk/by-uuid – lists disks by UUID
  • /dev/disk/by-label – lists disks by label
  • /dev/disk/by-id – lists disks by IDs

Summary

Azure Virtual Machines supports the ability to attach up to 16 1TB disks on an Azure VM. These disks are VHDs backed by page blobs in the Azure Storage Service. This makes them durable with an existence that transcends the existence of VMs to which they are attached. When multiple disks are attached to a VM they may be treated as a JBOD or combined into a RAID0 disk array. By aggregating the performance of individual disks the latter can provide significantly improved storage performance for data-intensive applications.

The Azure File Service is a preview of a managed service which provides a logical file system exposed through an SMB 2.1 endpoint. This file system can be mounted into Ubuntu (and Windows Server) VMs providing an alternative way to access persistent files. Furthermore, the Azure File Service supports the ability to mount the same logical file system into multiple VMs simultaneously making it easy to share files among several VMs.

In any data-intensive application running on Azure it is crucial that a representative application workload be tested to identify the most appropriate disk topology for the application.  This may involve using a larger number of smaller disks or using several Azure storage accounts.

Posted in Azure, Linux, Storage Service, Virtual Machines | Tagged , , | 8 Comments

A Second Look at Project Orleans

Project Orleans is a preview from Microsoft Research of an Actor-based framework and runtime supporting the development and deployment of massively distributed systems hosted in Microsoft Azure. A specific goal of Orleans is to simplify the creation of distributed systems for developers who are not skilled in the art.

Orleans supports the core features of the Actor model: state encapsulation and safe messaging; fair scheduling; location transparency; and mobility. An Orleans grain (actor) contains fully encapsulated state that may only be changed by the grain itself, in response to a message it receives. Deep copy is used whenever data is inserted into a message. Instead of the default pre-emptive multi-threading .NET scheduler, Orleans uses a cooperative multi-threading scheduler to schedule the processing of messages by a grain and ensures that a message to a grain is completely processed before the next message is processed. Orleans manages the activation of a grain in a silo on a physical node and provides location transparency by completely hiding grain location from the application. Grains are virtual and may or may not be activated in a silo when they are not being used. This allows the Orleans runtime to support weak mobility since at different times the same grain may be activated in different silos.

This is a follow-up to an earlier post which gave a high-level overview of Orleans as well as providing a variety of links to the Orleans system downloads and documentation.

Grains

An Orleans application comprises a system of interacting grains of various types. The application is developed by defining a set of grain interfaces which are then implemented in a set of classes. The Orleans build system auto-generates an associated set of factory and reference classes. The application is deployed through deploying the assemblies hosting the grain implementations to the physical nodes hosting the silos and deploying the assemblies hosting the factory and reference class implementations to the clients, which may or may not be hosting silos. The Orleans runtime completely manages access to grains and clients only ever access grain references, regardless of whether or not the client is hosted in an Orleans silo (i.e., is another grain).

Grain Interfaces

The Orleans API exposes an IAddressable interface, the base for a number of marker interfaces used in the definition of grain classes. In essence, the IAddressable interface indicates the addressability through the Orleans runtime of objects implementing the marker interfaces.

The interface hierarchy for IAddressable is:

IAddressable
   IGrain
   IRemindable
   IGrainObserver

These are declared as follows:

public interface IAddressable {}

public interface IGrain : IAddressable {}

public interface IGrainObserver : IAddressable {}

public interface IRemindable : IGrain, IAddressable {
   Task ReceiveReminder(String reminderName, TickStatus status);
}

IAddressable is an empty marker interface indicating that the Orleans runtime is able to address an instance implementing one of the derived interfaces. IGrain is an empty marker interface indicating that any derived interface is a grain interface. IGrainObserver is an empty marker interface indicating that a derived interface is implemented by an observing class. IRemindable is a marker interface indicating that an implementing class can receive reminders.

IGrain is a core interface for Orleans. Every grain class implements an interface derived from IGrain. These grain interfaces specify the functionality of the grains used in an Orleans application. The messages sent to grains are implemented as public methods and properties in the grain classes. Since Orleans is a distributed systems it is crucial that message processing is asynchronous and this is achieved by constraining grain interfaces so that all methods and properties return either a Task or a Task<T>. The async/await feature of .NET 4.5 greatly simplifies this. Methods can be defined as usual in the grain interface. However, properties must be handled in a special manner since the set method for a property essentially returns a void instead of a Task or Task<T>. Instead, a set method must be provided explicitly.

IGrainObserver is a marker interface indicating that an implementing class is able to observe a grain and process notifications issued by the grain. An observer implements an interface derived from IGrainObserver, the method of which are constrained to return only void. This means that instances of this class are not normal grains, for which the methods can return only Task or Task<T>. An observer must indicate to a grain that it must be notified of particular event, so the observing grain must expose methods supporting that subscription. A grain class can manage these subscriptions using the ObserverSubscriptionManager<T> class, with T being the observer interface. This class is declared as follows:

public class ObserverSubscriptionManager<T> where T : IGrainObserver {
   public ObserverSubscriptionManager<T>();
   public Int32 Count { get; }
   public void Clear();
   public void Notify(Action<T> notification);
   public void Subscribe(T observer);
   public void Unsubscribe(T observer);
}

Subscribe() adds an observer to the list of subscribers to be notified for the specific observable event. Unsubscribe() removes a specific observer from the notification list, while Clear() removes all subscribers. The notification is performed by invoking Notify() and invoking the appropriate notification action on each observing subscriber.

Grain Classes

The hierarchy for the classes implementing IAddressable is:

GrainBase
  GrainBase<TGrainState>
GrainReference

The implementation of a grain is provided by a class derived from either GrainBase or GrainBase<T> and which implements the appropriate grain interface class. GrainBase<T> extends the core GrainBase functionality by adding support for the persistence of grain state of type T, where T is an implementation of IGrainState.

GrainBase is declared as follows:

public abstract class GrainBase : IAddressable {
   protected GrainBase();
   public String IdentityString { get; }
   public String RuntimeIdentity { get; }
   public virtual Task ActivateAsync();
   public virtual Task DeactivateAsync();
   protected void DeactivateOnIdle();
   protected void DelayDeactivation(TimeSpan timeSpan);
   protected OrleansLogger GetLogger(String loggerName);
   protected OrleansLogger GetLogger();
   protected Task GetReminder(void reminderName);
   protected Task GetReminders();
   protected IStreamProvider GetStreamProvider(String name);
   protected IEnumerable<IStreamProvider> GetStreamProviders();
   protected Task RegisterOrUpdateReminder(void reminderName,
      IOrleansReminder dueTime, String period);
   protected IOrleansTimer RegisterTimer(Func asyncCallback,
      Boolean state, Object dueTime, Task period);
   protected Task UnregisterReminder(IOrleansReminder reminder);
}

IdentityString opaquely identifies the grain and RuntimeIdentity opaquely identifies the silo hosting it. ActivateAsync() is invoked each time the grain is activated – i.e., rehydrated into memory – and may be overridden to provide any additional initialization required. Similary, DeactivateAsync() is invoked each time the grain is deactivated and may be overridden (e.g., to persist grain state). DeactivateAsync() indicates that the grain should be deactivated as soon as the current request has completed. DelayDeactivation() hints that the grain should remain activated for the specified timespan. GetLogger() gets the Orleans logger which can be used to write entries to the Orleans log. In Azure, this log is persisted into the standard Azure logs provided by (Windows) Azure Diagnostics so may be accessed in the WADLogsTable in Azure Storage.

Orleans provides a timer capability, in which a timer can be created and associated with a grain. When this timer fires a method is invoked on the grain. It may be used, for example, to ensure that grain state is persisted periodically. This timer exists only while the grain is activated, and is cancelled whenever the grain is deactivated. The RegisterTimer() method is used to create a timer and specify the Task to be invoked when it fires. A timer is cancelled by disposing the handle returned by the RegisterTimer() method.

The Orleans reminder feature provides the capability of a timer which transcends grain lifetime. It does this by storing the reminder state either in-memory on the Silo (useful for development) or in an Azure Table. The latter is a distributed, persistent store, and its use allows a reminder to be sent even when a grain has been activated in another silo. The RegisterOrUpdateReminder() method is used to create or update a reminder, which is subsequently identified by name. For a grain to receive a reminder its class must implement the IRemindable interface. This interface exposes a ReceiveReminder() method which is invoked when the reminder is sent. The GetReminders() method returns all the reminders for the grain while the GetReminder() returns a reminder by name. UnregisterReminder() is used to delete a reminder.

The Orleans team has indicated that GetStreamProvider() and GetStreamProviders() should not have been exposed in the preview.

Persistent Grains

Orleans supports the persistence of grain state through the use of the GrainBase<TGrainState> class, where TGrainState is the class containing the data to be persisted. The grain state is persisted using a storage provider configured in the Orleans configuration file. When a grain is activated it is automatically initialized with its persistent state prior to the invocation of ActivateAsyn(), which can then be used to complete initialization (e.g., initialize any non-persisted state). However, the grain state is never persisted automatically so some strategy must be devised for grain state persistence.

In many Orleans systems the true grain state is actually resident on a client (for example, on an XBox controller) so can always be refreshed from there. Consequently, it is not necessarily crucial that the grain state be persisted whenever it is changed on the grain. It can be persisted occasionally using a timer or reminder and using DeactivateAsync() when the grain is deactivated. This deferred persistence writing helps improve performance.

Orleans provides in-memory and Azure Table storage providers – the former being suitable for development while the latter supports a production system. Orleans provides an extension point for creating storage providers. The Orleans team has provided a sample on CodePlex and Richard Astbury has created a GitHub repo with an Azure Blob storage provider.

GrainBase<T> is declared as follows:

public class GrainBase<TGrainState> :
      GrainBase, IAddressable where TGrainState : IGrainState {
   public GrainBase<TGrainState>();
   protected TGrainState State { get; }
}

The persistent state is accessed through the State property, which is strongly typed allowing its members to be accessed using property dot notation (e.g., State.LastName). A class derived from GrainBase<T> can have state not contained in State, but this is not persisted using the state persistence capability. This additional state can be managed in various ways including through the use of the ActivateAsyn() and DeactivateAsync() methods on GrainBase.

The TGrainState interface is derived from the IGrainState interface, which is declared as follows:

public interface IGrainState {
   String Etag { get; set; }
   Dictionary<String,Object> AsDictionary();
   Task ClearStateAsync();
   Task ReadStateAsync();
   void SetAll(Dictionary<String,Object> values);
   Task WriteStateAsync();
}

ClearStateAsync() is used to clear the current state. ReadStateAsync() is used to refresh the State property from the configured storage provider. WriteStateAsync() is used to persist the State property to the configured storage provider. AsDictionary() is used by state providers to expose the state as a Dictionary. SetAll() is used by storage providers to initialize the State property. ETag is an opaque value used by the storage provider.

Client Implementation

The Orleans server implementation comprises the grain interface and the grain class. The Orleans build system auto-generates a factory class and a reference class for each grain interface – and these are used by clients regardless of whether or not they are hosted in an Orleans silo. The factory class exposes static methods for creating grain references. The reference class implements the grain interface and the Orleans runtime proxies method invocations as messages to the actual grain.

The factory class implements methods like the following (where ISampleGrain is the grain interface):

public static ISampleGrain GetGrain(Guid primaryKey)
public static ISampleGrain GetGrain(long primaryKey)

These are used by clients to create grain references for the grain identified by the specified primary key. Note that this is purely a local operation and does not in itself cause the activation of a grain; that requires the invocation of a grain method.

The grain reference nominally has the type of the grain interface. It is actually an implementation of an auto-generated class derived from GrainReference and which implements the grain interface.

Example – Grain Interface

The following is a simple example of a grain interface:

public interface IPersonGrain : IGrain {
   Task<String> Name { get; }
   Task SetName(String name);
}

This example shows the use of a property getter with a standard method used instead of a property setter. As required all methods in the interface return either Task or Task<T>.

Example – State Interface

The following is a simple example of a state interface that persists only a single property:

public interface IPersonState : IGrainState {
   String Name { get; set; }
}

Example – Grain Class

The following is a simple example of a grain class implementing IPersonGrain and using the built-in grain persistence. Orleans loads state automatically on grain activation but grain state must be explicitly performed – in this case when the grain is deactivated.

[StorageProvider(ProviderName = “AzureStore”)]
class PersonGrain : GrainBase<IPersonState>, IPersonGrain {
   public Task<String> Name {
      get { return Task.FromResult(State.Name); }
   }

   public Task SetName(string name) {
      State.Name = name;
      return TaskDone.Done;
   }

   public override Task DeactivateAsync() {
      State.WriteStateAsync();
      return base.DeactivateAsync();
   }
}

The storage provider, AzureStore, is configured in the OrleansConfiguration.xml file.

Some Useful Techniques for Using Tasks

The Task class provides the following convenient way to create a completed task for a specific value:

Task.FromResult(value);

The Orleans API provides the following utility property to return a completed Task.

TaskDone.Done;

The following example shows the use of Task.WhenAll() to fan-out the sending of messages allowing them to be processed simultaneously:

List<Task> promises = new List<Task>();
for (Int32 i = 0; i < 10; i++) {
   var personGrain = PersonGrainFactory.GetGrain(i);
   promises.Add(personGrain.SetName(
      String.Format(“John-{0}”, name, i)));
}
await Task.WhenAll(promises);

Summary

The Orleans framework and runtime provides an easy-to-use implementation of the Actor model for the .NET platform. The definition of an actor (or grain) requires the creation of a grain interface derived from IGrainInterface and its implementation in a class derived from GrainBase or GrainBase<T>, where T is an interface identifying data the persistence of which is handled automatically.  Given the ease with which grains can be defined and the transparent manner in which the Orleans runtime allocates them to physical nodes, Orleans simplifies the development of certain classes of distributed systems.

Posted in Azure, Orleans | Tagged , | 2 Comments

A First Look at Project Orleans

Microsoft Azure Cloud Services is a PaaS offering that simplifies the task of deploying scalable applications. An Azure PaaS deployment comprises two files: a package containing the application assemblies; and a configuration file. This simplicity makes Azure Cloud Services a great environment for deploying scalable applications. However, the developer remains responsible for ensuring that the application functions well in the distributed environment provided by Azure Cloud Services.

Microsoft Research developed Project Orleans with the specific goal of simplifying the creation of scalable applications for “developers who are not distributed system experts.” Orleans is an implementation of an Actor model, using the constraints imposed by that model to reduce the complexity of developing a distributed system. This simplifies the development of certain classes of distributed system.

Orleans combines an application framework with a service runtime. The application framework abstracts away certain elements that complicate the development of distributed systems. The service runtime provides a simple model that supports application deployment into various environments from a single PC up to an Azure Cloud Service. An Orleans application is a composite of a client application (e.g., a website) and an Orleans server application hosted by the runtime. Orleans is in preview but is currently used to provide some high-scale, backend services, hosted in Azure, for games such as Halo 4.

The Orleans preview can be downloaded from Microsoft Connect. This download comprises the application framework, including Visual Studio tooling, and the Orleans runtime. The Orleans documentation along with various samples are hosted on CodePlex.

Microsoft Research hosts the home page for Orleans. The Orleans team has written a very readable research report that describes the Orleans architecture in some depth (be sure to read the 2014 version). Hoop Somuah (@hoopsomuah) and Sergey Bykov (@sbykov_work) did a Build 2014 presentation on Using Orleans to Build Halo 4’s Distributed Cloud Service. Richard Asbury (@richorama) talks about Orleans in a .Net Rocks Podcast. Caitie McCaffrey (@CaitieM20) has a post on Creating RESTful services using Orleans.

The rest of this post is a high-level look at some Orleans features. A follow-up post goes deeper into Orleans..

Actor Model

The Actor model was introduced in 1973 by Hewitt, Bishop and Steiger. In this model, an actor is the fundamental primitive for concurrent compute providing processing, state and communication. A system comprises many actors, which interact by sending messages to each other. An actor encapsulates state and data is not shared between actors. An actor processes a message in the following ways:

  • Create new actors
  • Send messages to other actors
  • Designate how to handle the next message it receives

The intent is that an actor is a simple entity with message processing being a manifestly concurrent operation that may change the internal state of the actor (and thereby the way subsequent messages are handled). A complex system is built through the interaction of many actors. By ensuring the concurrency of basic message processing, the actor model simplifies the creation of sophisticated distributed systems where concurrency can often cause significant problems.

Hewitt describes the Actor model in this recent paper. There is an excellent video on Microsoft Channel 9, in which Carl Hewitt discusses the Actor model with Erik Meijer, and Clemens Szyperski.

Orleans

Grain

Orleans is an implementation of the Actor model. In Orleans, an actor is referred to as a grain and the runtime host for grains is referred to as a silo. A grain is an instance of a .NET class implementing a marker grain interface. In a distributed Orleans system, there is a silo on each server hosting the Orleans runtime. An individual grain can be hosted in any of the Orleans silos, but the runtime provides location transparency so the user does not know which silo holds a particular grain.

Grain Lifetime

The Orleans runtime implements a model in which grains are deemed to have an eternal, but virtual, life. A grain is deemed active when it is physically resident in a silo and is otherwise deemed inactive. When a request for a grain is made, the runtime either returns a reference to a grain already activated in some silo or silently activates a grain in a silo and returns a reference to it. The caller is completely isolated from grain activation. This allows the runtime to manage resources efficiently by silently deactivating grains that have not been used for a while.

By default, the Orleans runtime does not provide affinity between a grain and a silo. This flexibility allows the runtime to hydrate a grain into any silo, and this allocation may change through the virtual lifetime of the grain. The runtime uses an internal discovery service to identify the silo containing a grain. The discovery service uses a distributed hash table located on each silo to store the identity of the silo currently containing a grain. As an optimization, each silo has a local cache which stores the location of recently-accessed grains.

Since a grain can be in any silo it is always accessed through a reference provided by the Orleans runtime. When a message is sent to a grain the runtime:

  • makes a deep copy of it
  • serializes it using a specialized binary serializer
  • transmits it to the correct silo
  • deserializes it
  • queues it for processing by the receiving grain.

Orleans completes the message-sending process by invoking a method on the receiving grain. The message invocation results in a promise that may or may not complete successfully, that is the promise may be fulfilled or it may be broken. The promise is implemented using the .NET Task classes, and is greatly simplified through use of the .NET 4.5 async/await feature. Message passing is an asynchronous operation so the caller is not immediately aware of the success or failure of the method invocation. This asynchrony is crucial to the scalability of Orleans, since the runtime is able to schedule invocation without blocking the caller.

A grain reference can be used either inside another grain hosted in an Orleans server application or in a client application hosted outside the Orleans runtime.

Grain Implementation

A grain is defined through the specification of an interface and the creation of a class implementing it. Orleans build-time tooling automatically generates a factory class for each grain class allowing references to grains of it to be retrieved.

The grain interface is derived from IGrain. The interface comprises one or more public methods returning either a Task or a Task<T>. A message to a grain corresponds to the invocation of one of the grain interface methods.The Orleans runtime handles the transfer of the method invocation from the sending grain through the messaging infrastructure to the receiving grain which, in a distributed system, is likely to be in a silo on another server.

The Orleans grain implementation is defined by creating a class derived from GrainBase that implements the interface. There is no need to define a constructor for the class, since auto-generated factory methods are used to create references to grains. One or both of ActivationAsync() and DeactivationAsync() methods can be defined to contain any specific grain activation and deactivation code.

The Orleans framework tooling creates the factory class for each grain type, and this process generates an error when it detects an invalid grain interface. The factory class exposes a set of factory methods used to retrieve a grain reference. Note that retrieving a reference does not lead to grain activation, which is done only when a message is sent to the grain.

Each grain is identified by its type and primary key, which is either a GUID or an Int64. Internally, the latter is zero-padded into the former. (It is also possible to declare a grain type with an extended primary key that includes a String.) By default, each specific grain is a singleton but it is possible to declare a stateless worker grain that the Orleans runtime can scale out automatically.

Orleans Runtime

The Orleans runtime schedules work as a sequence of turns, with a turn being the execution of a grain method up to the time a promise has been received (e.g., reaching an await statement; the closure following an await statement; or the return of a completed or uncompleted Task). To avoid concurrency problems, each grain is single-threaded so that only one turn is executed on a grain at any one time. A single request may result in several turns and, by default, the runtime processes all the turns for a request before processing any other requests for the same grain. Orleans provides high-scale by hosting many grains on a single server, so that even though request handling on each grain is single threaded the handling of individual requests on many grains is performant.

Orleans uses a purpose-built scheduler that provides cooperative multitasking instead of the preemptive multitasking of the standard .NET Task scheduler. For the turn-based scheduling used by Orleans this provides for much more efficient use of system resources than preemptive multitasking for the Orleans runtime.

Grain Persistence

The Orleans runtime can load persistent grain state on activation. This is independent of the ActivationAsync() method. For performance reasons grain state is not persisted automatically, instead state persistence for the grain must be explicitly managed by the grain implementation.

The Orleans support for grain state persistence is implemented by creating a class derived, from IGrainState, to hold that state. The grain class implementation must be derived from GrainBase<T> (instead of GrainBase), and implement the grain interface, where T is the class holding the grain state. The grain class can have additional state, stored in non-persisted private members, that can be initialized using ActivationAsync(). The IGrainState interface exposes WriteStateAsync() and ReadStateAsync() methods that are used to persist grain state and refresh grain state from the persistent store.

Orleans has the concept of pluggable storage providers to support grain state persistence. The storage provider is specified in the Orleans server and client configuration files. Several storage providers ship in the preview: LocalMemory is a development provider using local memory; AzureStorage persists grain state in Azure Tables (either cloud or development storage). Orleans provides a relatively simple extension point allowing the creation of additional storage providers. One of the samples demonstrates how to do this. Richard Astbury has published a storage provider using Azure Blob Storage.

Visual Studio Tooling

The Orleans framework provides three Visual Studio project templates for Orleans:

  • Orleans Dev/Test Host – creates a console app with an Orleans silo for development purposes
  • Orleans Grain Class Collection – contains the grain class implementations
  • Orleans Grain Interface Collection – contains the grain interfaces

The Orleans build tooling creates the grain factory classes used to access grain references. The files for these classes are located in the Properties\orleans.codegen.cs file under the interfaces directory for the project.

Deployment

Orleans can be deployed locally for dev/test purposes. It can also be deployed into group of local servers. However, a scalable system should be deployed into Azure.

A common Azure deployment is to host the Orleans server application in an Azure worker role and the client application in an Azure web role. The Orleans framework makes it trivial to deploy an Orleans server into an Azure worker role. Indeed, there is a one-to-one match between an Orleans runtime method and the Azure RoleEntryPoint overrides. The Orleans runtime is able to handle the scaling out of the worker role instances. In an Azure deployment, the Orleans runtime uses Azure Tables to store runtime information.

Summary

The development and deployment of scalable distributed systems is difficult. Project Orleans provides an application framework and runtime support that simplifies the creation of those distributed systems that can be implemented using an Actor model. Orleans is specifically designed to simplify the creation of distributed systems by developers who are not experts in distributed systems. It is also designed to play well with Azure, and clearly demonstrates the benefit of developing cloud-native applications for Azure Cloud Services. Orleans comes with Visual Studio tooling, documentation, and samples which make it easy to learn how to use it.

Posted in Azure, Orleans | Tagged , | Leave a comment

Windows Azure Training Events– San Francisco Bay Area

In March and April 2014, Satory Global is hosting several Developer Camps focused on Windows Azure and Modern apps. These are in San Francisco, CA and Sunnyvale, CA.

The camps are a mixture of presentations and hands-on labs, where you will get the opportunity to learn and try out various aspects of Windows Azure and how Modern apps can use it as a backend.

Windows Azure Developer Camp:
Make It Happen In The Cloud
(Register)
Date:  March 6, 2014
Time:  8:30-5:00
Location:
Microsoft
1010 Enterprise Way
Building B
Sunnyvale, California 94089

Windows Azure Developer Camp:
Make It Happen In The Cloud
(Register)
Date:  April 9, 2014
Time:  8:30-5:00
Location: Microsoft
835 Market Street
Suite 700,
San Francisco, California 94103
Developer Camp:
Extending Your Existing Apps On The Microsoft Modern Platform
(Register)
Date:  April 29, 2014
Time:  8:30-5:00
Location:
Microsoft
1010 Enterprise Way
Building B
Sunnyvale, California 94089

I hope to see you there.

Posted in Training, Windows Azure | Tagged | Leave a comment