Cassandra with a Hint of Azure

Apache Cassandra is a highly-scalable, NoSQL, distributed database system. It achieves high scale by distributing data symmetrically among all the compute nodes in a cluster. In performance testing a 288 node Cassandra cluster, Netflix achieved more than one million writes per second.

Each compute node in a Cassandra cluster contains a portion of the data, and that portion may be replicated on multiple nodes in one or more datacenters. Unlike other NoSQL systems, Cassandra supports consistency that is tunable from eventual consistency up to strong consistency.

Cassandra is open source and is currently at release v1.2. It comes with a SQL-like query language, CQL, that is currently at v3. The latest releases of Cassandra and CQL introduce a number of changes that simplify the task of learning their use making this a good time to look into the technology. Cassandra is worth considering where large amounts of data must be stored with good write performance. An important  use case is the storage of time-series data.

Datastax provides both community and enterprise versions of Cassandra. There are downloads for various Linux distributions, Mac OS X, and Windows with some excellent documentation.

The Windows Azure Developer Portal has an article written by Hanu Kommalapati showing how to deploy a Cassandra cluster to Windows Azure Virtual Machines running Linux and access it using a Node.js front end (note the fix suggested by Patriek van Dorp at the end of the article). Charles Lamanna (@clamanna) has a post showing how to deploy a performant Cassandra cluster to Windows Azure Virtual Machines running Linux (Q&A). Robin Schumacher has a post showing how to setup and monitor a multi-node Cassandra cluster on Windows. Kelly Sommers (@kellabyte) has an introductory post on Cassandra running in Windows.

There are many NoSQL choices available today. Kristof Kovacs (@kkovacs) has a post comparing Cassandra, MongoDB, CouchDB and Redis. Jonathan Ellis (@spyced), the CTO of Datastax, has a post comparing Cassandra with MongoDB, Riak and HBase. He has also uploaded an excellent deck showing the features introduced in Cassandra v1.2.

Data Model in Cassandra

Cassandra has an unusual – and, to be honest, somewhat confusing – physical data model. CQL3 provides a simpler logical data model.  This post is only going to consider the CQL3 data model. The two data models leads to some nomenclature confusion to which I will add by describing it in terms that users of the Windows Azure Table Service would understand.

The data model comprises keyspaces, tables and columns. Tables have traditionally been called column families, but CQL3 uses the keyword TABLE instead.

A keyspace is similar to a schema in a relational model, in that it provides a container for application tables. Cassandra uses data replication to support high availability and data durability, and data replication is configured at the keyspace level.

A Cassandra table comprises a sequence of rows, each of which contains a set of columns, each of which is a name/value pair with a timestamp. A table has a composite primary key, which serves two functions. The first column in the primary key serves as a partition key and is used to allocate the row to the appropriate node(s). The remaining columns, if any, in the primary key uniquely identify a row for a given partition key. The columns not in the composite key store additional data for a row – and do not all have to be present in a specific row. This is similar to a table in the Windows Azure Table Service with a partition key and row key, with the difference that the Table Service supports only a single column for the row key.

The above describes the logical view of the data as presented through CQL3 and is significantly less complex than the underlying physical data model. The physical data model pivots all the rows with the same partition key into columns of a single row. It is this pivot that leads to the statement that Cassandra supports up to 2 billion columns per (physical) row.

Rows are allocated to data nodes based on the hash value of the partition key. A Murmer3 hash is used by default in Cassandra v1.2. This hash algorithm is not ordered so that although partition keys are indexed, allowing rows to be retrieved quickly, they are not ordered so that a range query on the partition key is non-performant. The row key is ordered, allowing performant range queries. Cassandra also supports secondary indexes on the other columns. Any query not matching an index results in a table scan which is not advised in a scalable system.

The syntax of CQL3 is very similar to that of SQL and it provides the obvious DDL and DML statements. Cassandra does not support joins so queries can be against only a single table.

Cassandra Architecture

Cassandra is designed to be a highly scalable database. This scalability is achieved by sharding the data over many compute nodes. Cassandra provides for high availability by using symmetric nodes, with no single node being privileged over the other nodes. Consistency is tunable by requiring that writes and reads be performed on specified numbers of nodes in each request.

The sharding model in Cassandra uses distributed hashing tables based on the technique used in Amazon Dynamo. A Cassandra cluster has an associated hashing algorithm – by default, Murmer3. Each Cassandra node is associated with a range of hash values, referred to as tokens.

In the traditional distribution model – using physical nodes – the nodes are configured as a ring. Each node is associated with a single token, which specifies the upper value for the partition key hash of the rows allocated to the node. Essentially, going round the ring the nodes host successive ranges of rows (with successive referring to the hashed value of the partition key). The token for each node must be calculated prior to node configuration which complicates the addition of new nodes. High availability is achieved by replicating data to one or more nodes. The simplest model is to replicate the data to successive nodes in the ring, although other choices are configurable including some that take account of the physical layout of the nodes in the datacenter.

Cassandra v1.2 introduced virtual nodes, in which each node hosts many token ranges. By default, each node has 256 token ranges. When replication is configured, the data for each token range is distributed to a random set of other nodes. The benefit of this is that if a node needs to be regenerated the data can be retrieved from many other nodes rather than a few, as is the case with physical nodes. Virtual nodes makes it much easier to add new nodes since Cassandra handles the allocation of token ranges.

Windows Azure

Cassandra is a Java-based system that can be deployed into Windows Azure Virtual Machines. It runs on Windows Server VMs, but since Virtual Machines supports various Linux distributions that is the deployment environment of choice. Depending on instance size, each Windows Azure instance can support between 1 and 16 disks each of which can be up to 1TB in size.

In doing a Cassandra deployment to Windows Azure it is worth doing performance testing to identify the optimal instance and disk size for the anticipated workload. This may involve, for example, using different storage accounts for disks attached to different VMs. The most authoritative comment I’ve seen indicates that this should not be necessary for different disks attached to a single instance. It is worth bearing in mind that Virtual Machines are still in preview and that the performance characteristics may be different when it goes GA.

Posted in Cassandra, Windows Azure | Tagged , | 3 Comments

In-Person Event: Windows Azure HDInsight Service

I’m presenting at a Windows Azure, Big Data and Windows Azure HDInsight Service event. The invitation is (mostly) as follows:

Microsoft would like to extend an invitation to attend the Microsoft Cloud and Big Data at Microsoft Silicon Valley Moffett Towers in Sunnyvale, CA on Wednesday January 23, 2013. Join us for this half-day session, led by Satory Global, a Microsoft Azure Partner of the Year (West Region), where we’ll provide an overview of Big Data and the Windows Azure HDInsight Service, Microsoft’s implementation of Apache Hadoop. The event includes presentations as well as demonstrations of HDInsight in action.

Note that this event is not at Microsoft SVC but at:

Microsoft Silicon Valley Moffett Towers
1020 Enterprise Way, Building B
Sunnyvale, CA 94089

To register, click here and use the following invitation key: C00371

 

Time Description
  Wednesday, January 23, 2013
8:30AM-9:00AM

Windows Azure Overview

Windows Azure is a platform supporting a comprehensive set of PaaS and IaaS cloud services. These range from core services such as compute and storage to higher-level features such as Windows Azure Active Directory and Windows Azure Mobile Services. This session will provide a quick overview of Windows Azure providing the context for the discussion of Big Data.

9:30AM-10:45AM

Big Data and HDInsight

Microsoft has developed HDInsight – a 100% Apache-compatible version of Hadoop that can be used as a Windows Azure service or deployed into a Windows Server private cloud. With HDInsight, Microsoft has integrated the Hadoop ecosystem – MapReduce, Pig, Hive, Sqoop, etc. – with its existing Business Intelligence services. This session will show how Big Data can revolutionize your business and specifically how the Windows Azure HDInsight Service can enhance your business-intelligence capability.

10:45AM-11:00AM Break
11:00AM-12:00PM

Demonstrations + Q&A

The Windows Azure HDInsight Service supports various techniques for invoking Apache Hadoop services. This session will provide demonstrations of the various features and how they are implemented in the Windows Azure environment. It will also provide time for Q&A.

12:15PM-12:30PM Close and evaluation

I hope to see some of you there.

Posted in Hadoop, HDInsight Service, Windows Azure | Tagged , , | Leave a comment

my tale of npm woe – when all else fails, fix your path!

To end the year I decided to look at Node.js and, in particular, the restify module. Things went south pretty quickly when npm install failed for restify with the following error (fragment):

npm ERR! git clone git://github.com/pvorb/node-clone.git CreateProcessW: The
system cannot find the file specified.
npm ERR! git clone git://github.com/davepacheco/node-verror.git
CreateProcessW:
The system cannot find the file specified.
npm ERR! Error: `git “clone” “git://github.com/pvorb/node-clone.git”
“C:\\Users\\Neil\\AppData\\Local\\Temp\\npm-704\\1357013683090-0.8436955371871591″`
failed with 127
npm ERR! at ChildProcess.<anonymous> (C:\Program Files\nodejs\node_modules\npm\lib\utils\exec.js:56:20)

I reinstalled Node on the PC and tried again with no more success. I used another PC and again met failure. This surprised me because I had successfully installed restify a few weeks ago.

A web search quickly brought me to a Glenn Block (@gblock)  post with a similar seeming error when he installed restify (and apologies to Glenn for stealing his title). Glenn solved his problem by cleaning out the npm cache. I tried this using:

npm cache clean

and watched the cache vanish from %appdata%\npm-cache. Alas, unlike Glenn’s, my tale of npm woe continued since this didn’t solve the problem.

Another search led me to this post on the Windows Azure forums with the title npm fails on git dependencies in package.json. The latest version of restify has two git dependencies in its package.json – and they seem to be the source of the problem since tweaking the restify package,json for restify to handle git download for them allows git install to complete successfully for that package.

This is not really a satisfactory solution but a careful reading of the forum post leads to the solution. Amit Apple (@amitapl) Amit writes: the part that’s failing is to find git.exe since you’re using a git url to install the npm package.

It’s hard from the original error message to detect that git.exe is the problem – as, indeed, it is. Although git.cmd was in the path git.exe was not – causing the download to fail since npm looked for git.exe. Fixing the path fixed my npm problem – woohoo.

And so my tale of npm woe ended. Now onto the new year and new problems.

Posted in Node.js | Tagged , | 6 Comments

Windows Azure Cloud Services and Virtual Networks

Windows Azure has historically been a pure PaaS solution with the deployment unit for compute being a hosted service comprising an optional web role and zero or more worker roles. Each role is deployed as one or more virtual machine instances. A hosted service formed a security boundary, with the only way to access role instances being through the load-balanced public input endpoint. Role instances inside a hosted service can communicate directly using internal endpoints which provided lower latency because they didn’t go through the Windows Azure load balancer.

In June 2012, Microsoft announced previews of Windows Azure Virtual Machines, an IaaS offering, and Windows Azure Virtual Networks. It also brought a name change from hosted service to cloud service for the compute deployment unit. There is a little bit of confusion about the use of cloud services because, to make the deployment of a single virtual machine as simple as possible, a cloud service is implicitly created when a single IaaS Virtual Machine is deployed. This cloud service is only made apparent in certain circumstances such as the deletion of the Virtual Machine or the association of a second Virtual Machine with the first.

Microsoft simultaneously announced a preview of Windows Azure Web Sites (WAWS) which provides a scalable, high-density, web hosting solution. The emphasis in WAWS is on ease of deployment which is far better with WAWS than it was with PaaS web roles. However, this ease of deployment comes with more restrictions on deployments than there had been with traditional PaaS web roles.

At any given time, a cloud service hosts either a PaaS deployment or an IaaS deployment – but not both. Either a PaaS service or an IaaS service can be deployed into an empty cloud service. Some of this functionality is not exposed on the Windows Azure Portal, and can only be achieved using either PowerShell or script cmdlets.

The (awesome) Windows Azure Platform Training Kit contains a hands-on lab (Windows Azure Web Sites and Virtual Machines using ASP.NET and SQL Server) which uses a Windows Azure Web Site as the front end for a Virtual Machine hosting SQL Server. This HOL uses a public endpoint for the SQL Server Virtual Machine – which consequently means there is a raw SQL Server endpoint sitting on the public internet.

Hanu Kommalapati has posted an interesting example which hosts a Cassandra cluster in 3 Virtual Machines with a front-end provided by a another Virtual Machine running a web server developed in Node.js. This example also uses a public endpoint for the Cassandra cluster – which consequently means there is a raw Cassandra endpoint sitting on the public internet.

Virtual Networks

Virtual Networks improves the composition of cloud services by allowing one or more of them to be added to a Virtual Network. Note that when a Virtual Network is used to host cloud services, the security boundary is extended to comprise all the cloud services in the Virtual Network. A cloud service in a Virtual Network can directly access individual instances in a second cloud service contained in the Virtual Network without going through the load balancer hosting a public input endpoint for the second cloud service. This means that once a traditional PaaS cloud service is added to a Virtual Network the cloud service no longer forms a security boundary and any open port on its role instances can be accessed by any instance of any cloud service in the Virtual Network. This is regardless of whether the cloud service is IaaS or PaaS.

A back-end data service, hosted on virtual machines in an IaaS cloud service, can be kept off the public internet but remain accessible to role instances hosted in a front-end PaaS cloud service. Similarly, an IaaS cloud service can access the role instances of a PaaS cloud service without any need for the latter to have (input) endpoints exposed to the internet.

The first example described earlier – a WAWS front end to a SQL Server back end – cannot use this technique because a WAWS website cannot be added to a Virtual Network. However, the example would work were the front-end website to be hosted by a PaaS web role located in the same Virtual Network as the back-end SQL Server Virtual Machine. The latter would not need a public endpoint it would only need an appropriately configured firewall. The second example can use this technique, with the Node.js cloud service having a public endpoint and the Cassandra cluster having no public endpoint.

A very important point is that the Virtual Network MUST be created before any cloud services are created in it since once a cloud service has been created it is not possible to migrate it into a Virtual Network.

Michael Washam describes this technique in one of the excellent posts on his blog.

Creating the Virtual Network

A Virtual Network is created using the Networks item in the preview Windows Azure Portal. This brings up a wizard which requests the following information:

The Virtual Networks page requests the following information:

  • Name
  • Affinity Group

The Address Space and Subnets page requests the following information:

  • Address space (for the network) in the format a.b.c.d/x. For example:
    • 10.10.0.0/16
  • Address space for the subnets in the format a.b.c.d/x. For example:
    • FrontEnd: 10.10.10.0/24
    • BackEnd: 10.10.11.0/24

The DNS Servers and Local Networks page can be passed through without providing any information.

Once the virtual network has been created, its configuration can be viewed as follows:

samplenetwork

Adding a Virtual Machine (IaaS cloud service) to the Virtual Network

An IaaS cloud service is added to the virtual network by creating it from the gallery and specifying the appropriate virtual network when asked for the Region/Affinity Group/Virtual Network. The appropriate subnet is selected on the VM Options page in the wizard, as follows:

VMOptions

Once the virtual machine has been created, remote desktop can be used to access it. The Window Firewall with Advanced Security application can then be used to modify the firewall as needed.

Adding a PaaS Cloud Service to the Virtual Network

A PaaS cloud service is added to a virtual network by adding a NetworkConfigurationsection to its Service Configuration file. This is located after the end of the Role section. For example:

<NetworkConfiguration>
  <VirtualNetworkSite name="SampleNetwork" />
  <AddressAssignments>
    <InstanceAddress roleName="ContactManager.Web">
      <Subnets>
        <Subnet name="FrontEnd" />
      </Subnets>
    </InstanceAddress>
  </AddressAssignments>
</NetworkConfiguration>

Services on role instances can be exposed to other virtual machines in the Virtual Network by modifying the firewall on each role instance.

Summary

The Windows Azure Virtual Network feature, currently in preview, provides the ability for cloud services to interact with each other without exposing services to the public internet. This is a significant enhancement since previously cloud services could not be grouped into composite services without exposing required endpoints to the public internet.

Posted in Cloud Service, Virtual Machines, Virtual Network, Windows Azure | Tagged , , , , | 12 Comments

Overview of the Windows Azure Platform

In my day job at Satory Global I spend a lot of time educating people about the Windows Azure Platform and helping them develop applications on the platform. In my book, the Microsoft Windows Azure Development Cookbook, I provided instructions for how to perform about 80 or so specific tasks when developing for the Windows Azure Platform. In this blog, I tend to write posts focused on a specific Windows Azure feature or API. In a post about a year or so ago I described the Windows Azure Platform as it was then. In this post I am going to describe the Windows Azure Platform as it is now, showing how the recently announced infrastructure-as-a-service (IaaS) features complement the platform-as-a-service (PaaS) features that have traditionally defined Window Azure.

The post is organized around the four core features of any cloud service offering:

  • Compute
  • Storage
  • Connectivity
  • Management

Compute

The Windows Azure Platform provides three distinct compute models:

  • Windows Azure Web Sites
  • Cloud Services
  • Virtual Machines

Windows Azure Web Sites

Windows Azure Web Sites provides a multi-tenanted, high-density hosting environment into which web sites can be deployed and upgraded in a matter of seconds. Web Sites provides a hosting environment for ASP.NET websites as well as web sites developed in Node.js, PHP and Python. Applications can be deployed using git, FTP or TFS. Windows Azure hosts a git repository to facilitate the use of git. Web Sites supports the horizontal scalability of web sites from a single instance of a shared web site up to 3 instances of a dedicated 4-core large instance. This scalability can be managed directly on the portal.

The ease-of-use of Windows Azure Web Sites is enhanced by the provision of a gallery on the Windows Azure Portal allowing the easy creation of web sites implemented in content-management systems such as WordPress. The portal also exposes monitoring information such as CPU time, number of requests, and data out for web sites.

Cloud Services

Cloud Services is the new name for the traditional hosted service PaaS offering on Windows Azure.  A Cloud Services application is described through a service model which specifies the server types (or roles) comprising the application. There are three types of role: web role, worker role and VM role. Although there used to be a significant difference between a web role and a worker role, they are now pretty much the same with the primary difference being that IIS is preconfigured on a web role. A VM role allows a Windows Server 2008 virtual machine to be uploaded as a cloud service. However, the capabilities of a VM role have been superseded by the much more functional Virtual Machines, so VM role should now be regarded as deprecated (even if not officially).

A web role or worker role is a scalability unit for a Cloud Service. A web role or worker role is deployed as a multiple instances with each instance being hosted in a virtual machine (VM) with a specified number of cores, and associated RAM and local storage. Windows Azure supports various instance sizes from small with 1 core through extra-large with 8 cores. The other compute resources – RAM, local storage and network I/O – scale linearly with the number of cores. Windows Azure also supports a low-cost, extra-small instance providing a fraction of a core. The number of instances of a web role or worker role is specified in the service model configuration making it easy to elastically scale the number of running instances. It is this ability to scale elastically to satisfy current demand that provides one of the primary cost drivers of the cloud computing.

The role instances deployed in Cloud Services are stateless which means that there is no durability guarantee on any information written to the drives attached to the virtual machine hosting the instance. The role instance does have some local storage that survives reboots, and re-images where the application bits are reset to their original state on the instance, but it does not survive migrations when the instance is moved to another physical server as a consequence of the Windows Azure self-healing process. Cloud Services supports the Azure Drive feature in which a role instance can mount, as an NTFS drive, a VHD stored as a page blob in Windows Azure Blob Storage to provide durable storage for the instance.

Cloud Services is a PaaS offering and both web roles and worker roles are best thought of as application hosting environments. Microsoft is responsible for ensuring the physical health of the hosting environment and performs automatic self-healing when problems are found. This can include the automatic migration of a role instance to another server when problems are found with the physical hardware. Windows Azure provides various automated upgrade techniques, including in-place upgrades and VIP swaps, that come into play during an OS or application upgrade to ensure the continued availability of the application.

Windows Azure performs much of the post-deployment management of the application  but the developer must remain aware that the application is deployed to a managed environment and must hook into the environment as necessary. Cloud Services supports startup tasks which can be used to ensure the correct configuration of the runtime environment for an instance. This includes installing any software required by the instance. Windows Azure provides Windows Azure Diagnostics which can be used to capture diagnostic information – including performance counters, trace logs, and custom logs – and then persist them to Windows Azure Storage where they can then be accessed. This minimizes the need to access individual role instances, although Remote Desktop can be used if necessary.

Virtual Machines

The PaaS model, of Window Azure Web Sites and Cloud Services, is ideal for green-field development. It can provide significant operational benefits for migrated applications, but with an increased development cost to handle integration with the application hosting environment. However, there are many workloads which can not be migrated into a PaaS environment or which can be migrated only at significant development cost that mitigate the operational benefits.

An IaaS model provides an attractive setting for these difficult-to-migrate workloads since the hosting environment is the OS level not the application level. However, this ease of migration comes with a significantly increased operational cost. Of course, much of this cost would have been present regardless of whether the application was hosted in IaaS in the cloud or in an on-premises datacenter.

Microsoft has now released into preview an IaaS model named Windows Azure Virtual Machines. This significantly increases the types of workload that can be hosted in a Windows Azure datacenter. For example, Virtual Machines supports the deployments of applications hosted in Linux which allows many open-source applications developed for Linux to be deployed to Windows Azure. Michael Washam has an excellent post describing the features of Virtual Machines.

The Windows Azure Portal exposes a gallery of Virtual Machine types that can be deployed directly using the portal or management scripts. This gallery includes various Windows Server offerings as well as several different Linux distributions. One of the offerings in the gallery is a pre-configured SQL Server 2012 virtual machine. It is anticipated that Microsoft will offer additional server offerings directly on the portal, such as SharePoint, as well as other Linux distributions. Furthermore, other companies such as RightScale and Suse are providing Virtual Machine configuration and management services simplifying the configuration and deployment of specific Virtual Machine instances.

In Virtual Machines, the OS drive and any associated data drives are stored in Windows Azure Blob Storage as VHDs persisted in page blobs in Windows Azure Blob Storage. These drives are durable so that any information written to them survives instance failures and self-healing migrations. They are similar to the Azure Drives used in Cloud Services. However, in Cloud Services the drives must be attached by code running on the instance while in Virtual Machines the drives are attached either automatically, for the OS drive, or externally by using the Windows Azure Portal or service management scripts.

Storage

Windows Azure provides various forms of persistent storage with different characteristics. The  Windows Azure Storage Service provides cost-effective, high-scale storage that can scale up to the petabyte level (when multiple storage accounts are used). The Windows Azure SQL Database (formerly called SQL Azure) provides a managed version of SQL Server 2012. Between them these services support a wide variety of storage capabilities including large file storage, NoSQL and relational. Both Windows Azure Storage and SQL Database are shared multi-tenant systems with performance characteristics impacted by this multi-tenancy.

The Windows Azure Storage Service provides three features: blobs, tables and queues.  Note that the Storage Service stored 3 local copies of all data and, unless configured otherwise, makes 3 copies asynchronously to a remote datacenter. The Storage Service always uses strongly consistent writes and does not use eventual consistency.

The Blob Service supports the storage of files in two types of blob: block blobs and page blobs. A block blob is targeted at streaming media with the primary access mechanism being a read from the start to the end of the file. A page blob is targeted at random access with the primary use case being as the backing store for the VHD used in Azure Drive with Cloud Services or as the attached OS or data drive with Virtual Machines. The Table Service is a kev-value, NoSQL store for the durable storage of schema-less entities. The Queue Service provides a simple message queuing service intended for disconnected communication among role instances and virtual machines.

The Windows Azure SQL Database is a managed relational database based on and providing most of the functionality of SQL Server 2012. SQL Database is a multi-tenant system in which application databases are deployed into a managed SQL Server instance. For high availability, SQL Database stores a primary and 2 secondary copies of each database. SQL Database uses a quorum commit for writes.

SQL Database is a multi-tenanted system with performance characteristics that differ from a standalone SQL Server database. SQL Database Federations provides a managed sharding capability that supports high scale both for performance and data for SQL Database. Each individual federation member database provides the performance and data scale characteristics of an individual database so that in aggregate a complete federation can provide very high scale and throughput. Essentially, SQL Database Federations provides horizontal scaleout of data to match the horizontal scaleout that role instances provide for compute.

Windows  Azure provides several non-durable storage options directly on virtual machines. In Cloud Services, each role instance has some amount of local storage available on the C: drive that applications can use. This local storage persists as long as Windows Azure does not migrate the role instance to another physical server as part of the server healing process. In Virtual Machines, each virtual machine has a local D: drive primarily used for the page file. This drive does not survive the virtual machine being migrated to another physical server.

It also provides two distinct caching mechanisms: the Windows Azure Caching Service and the Windows Azure Caching. The Caching Service is essentially a managed version of the Windows Server AppFabric Caching Service hosted in a Windows Azure datacenter. It provides centralized caching with a local on-machine caching option. The Caching Service also supports the caching of ASP.NET session state. Windows Azure Caching, currently in preview, allows a ring cache to be created using spare memory of Cloud Service role instances either in an existing role or a worker role specifically created to host the cache. All instances of the cache-configured role participate in this cache.

Connectivity

Windows Azure supports various forms of connectivity:

  • Endpoints
  • Virtual Network
  • Windows Azure Connect
  • Windows Azure Traffic Manager
  • Windows Azure Service Bus

Cloud Services and Virtual Machines both use endpoints to manage the internal IP addresses and ports they expose. There are various types of endpoint depending on whether or not they are exposed to the public internet and on the way in which traffic is directed to them. The endpoints are defined at the role level for Cloud Services and at the service level for Virtual Machines. A Load Balancer Probe can be associated with each endpoint and used to verify that role instances or virtual machines hosting the endpoint are available and, if not, remove them from load balancer rotation. Unless contained in a Virtual Network, a Cloud Service or a Virtual Machine service is a security boundary that can be accessed only through one of the public endpoints. Windows Azure supports both TCP and UDP for internal and external network traffic.

Cloud Services have two types of (public) input endpoint – input endpoint and instance input endpoint – and one type of (private) internal endpoint. These endpoints are associated with individual roles . The input endpoints differ with respect to how the load balancer handles traffic. For an input endpoint, the load balancer uses a round-robin algorithm to direct inbound network traffic to each role instance in turn. For an instance input endpoint, the load balancer uses port forwarding to direct inbound network traffic to a specific role instance. An instance endpoint is used to allow intra-service traffic within a Cloud Service. The Windows Azure Service Runtime API provides methods allowing the actual IP addresses and ports used by these endpoints to be discovered at runtime.

Virtual Machines support two types of public endpoint – Load Balanced and Port Forwarded. The load balancer uses round-robin load balancing to handle traffic directed to a Load Balanced endpoint and port forwarding to direct traffic to a Port Forwarded endpoint. Windows Azure does not restrict traffic internal to a Virtual Machine service so there is no need for an endpoint like the internal endpoint of Cloud Services.

The advent of Virtual Machines drives a need for more sophisticated network architectures than was needed when there was only Cloud Services. Be default, a Virtual Machines service and a Cloud Service form distinct security boundaries traffic can flow between them only through a public endpoint. The new Virtual Network capability allows the creation of a virtual network that supports the composition of disparate services into a larger service in which the security surface can be minimized. For example, a virtual network can be created to contain a Virtual Machine hosting SQL Server with no public endpoint and a Cloud Service with an HTTP public endpoint exposed by a web role.

Windows Azure supports two types of VPN connectivity. Virtual Networks provides a robust hardware-based site-to-site VPN capability allowing for hybrid solutions that span both on-premises services and services hosted in Windows Azure. This is an IT-focused offering requiring configuration of on-premises VPN hardware. Windows Azure Connect is an software agent based VPN that developers can use to provide a simple connection between on-premises machines and Cloud Services. The software agent is available only for Windows which limits the utility of Windows Azure Connect.

The Windows Azure Traffic Manager provides load-balancing capability for public HTTP endpoints in a Cloud Service or a Virtual Machine service. It supports 3 types of traffic distribution: geographical in which traffic is directed to the server with the least latency from the current location; active-passive failover where the traffic is failed over to a passive backup service when the active service fails to respond to probes; round-robin load balancing between multiple services.

The Windows Azure Service Bus provides 2 ways for services to communicate with each other: Relayed Messaging and Brokered Messaging. These facilitate the composition of services in a service oriented architecture.

In Relayed Messaging, a service and a client both connect using outbound connections to a Service Bus endpoint hosted in a Windows Azure datacenter. The Service Bus then welds these connections together allowing two-way communication between the client and the server, both of which could be hidden behind firewalls restricting inbound traffic.

In Brokered Messaging, a Service Bus endpoint in a Windows Azure datacenter hosts a durable message store supporting various publish/subscribe scenarios. The most basic is when the endpoint hosts a Queue with multiple senders sending messages and multiple consumers competing to receive messages.  Brokered Messaging also supports a Topic/Subscription model comprising a topic to which messages are sent and one or more subscriptions which receive filtered copies of the messages. Each subscription has its own set of subscribers, which compete for the messages in the subscription.

Management

The Windows Azure Portal can be used to manage the various features provided by Windows Azure. This exists in two versions providing somewhat different functionality: the old Silverlight portal providing almost complete management of traditional Cloud Services and Storage Services; and a new HTML5 portal supporting Virtual Machines and Virtual Networks. The intent is that all existing functionality will be migrated to the HTML5 portal.

The portals are GUI front-ends for the Windows Azure Service Management REST API. This can be used to manage Cloud Services, Virtual Machines, and Storage Services accounts. There are two different sets of PowerShell cmdlets that can be used to manage Windows Azure services: CodePlex cmdlets developed by Microsoft DPE; and the new cmdlets supported by the Windows Azure Product Group. The CodePlex cmdlets are no longer being developed but contain some functionality not yet exposed in the official cmdlets, although that will presumably change. Additionally, there are also command line interface (CLI) scripts for the Mac and Linux that, being written in Node.js, also run on Windows.

A number of companies have developed tools for managing and monitoring Windows Azure Services and Storage. Cerebrata, now part of Red Gate, has the very useful Cloud Storage Studio which can be used to provide Explorer-like functionality for the Windows Azure Storage Service. AppDynamics provides monitoring and scaling services for applications hosted in Windows Azure.

Next Steps for Developers

You can sign up for a 90-day free trial of Windows Azure by going to https://www.windowsazure.com/en-us/pricing/free-trial/. This provides free access to all the features described in this post – obviously with limits on the amount of resources provided. The next step for .NET developers is to download the Windows Azure SDK for .NET from the .NET Developer Center. There are similar landing pages for Node.js, Java, PHP, and Python developers.

Posted in Caching, Cloud Service, Storage Service, Virtual Machines, Virtual Network, Windows Azure | Tagged , , , , | 1 Comment

Affinity Groups in Windows Azure

Windows Azure Datacenters

Microsoft has built out 8 Windows Azure datacenters in 3 geographical regions across the World. They are located in:

  1. North America
    • North Central US
    • South Central US
    • East US
    • West US
  2. Europe
    • North Europe
    • West Europe
  3. Asia
    • East Asia
    • South East Asia

These datacenters are widely separated so that inter-datacenter latency is significantly higher than intra-datacenter latency. There is little that can be done about inter-datacenter latency since much of it is caused by the speed of light being a finite physical constant. Unless there is a specific reason for doing otherwise cloud services and any data they access should always be located in the same datacenter.

It used to be possible to specify a US | Europe | Asia Anywhere designation when allocating a cloud or storage service. The problem with this designation was that it was not clear where everything ended up and it was possible that associated cloud and storage services were actually located in different datacenters in the same geographical region. Fortunately, this choice has now been removed. However, if you created cloud and storage services using the Anywhere designation it is worth verifying their actual physical location.

Affinity Groups

Windows Azure datacenters are physically very large (think 7 football fields) and contain hundreds of thousands of servers. There is a significant difference in network latency between two servers in a single rack and two servers at opposite ends of a datacenter.

Windows Azure therefore provides an affinity group feature to provide a higher degree of co-location within a datacenter than would otherwise be possible using random placement. Associated cloud and storage services should be placed within an affinity group to minimize network latency. This minimization is particularly important when a cloud service makes extensive use of storage services, such as when an Azure Drive is used.

Note that affinity groups are supported for cloud services and storage services but are NOT used with Windows Azure SQL Databases. As of today they are also not supported for Windows Azure Web Sites and Virtual Machines.

Since an affinity group is located within a single datacenter, locating cloud and storage services in an affinity group also ensures that they are located within the same datacenter. New cloud and storage services should always be created within an affinity group rather than just the datacenter. Note that it is not possible to migrate either a cloud or storage service into an affinity group after creation. Migration into an affinity group requires a recreation of the cloud or storage service/

Affinity groups are created and managed on the Windows Azure Portal for cloud and storage services. On the new portal, affinity groups are managed in the Networks section when creating a new Virtual Network. Given the above description of affinity groups this might seem to be a strange location. However, the new Virtual Network feature mandates the use of affinity groups.

Virtual Network

The Virtual Network feature (tutorial) currently in preview on Windows Azure and accessible through the new portal also uses affinity groups. However, this use is mandatory for virtual networks while it is only advisable for cloud services and Windows Azure storage.

Every virtual network resides in its own affinity group and only one virtual network can be associated with any affinity group. The cloud and storage services contained in the affinity group can then be associated with the virtual network in a manner that maximizes co-location of the services and minimizes network latency inside the virtual network.

When using the Wizard to create a virtual network either an existing affinity group must be provided or a new one created using the Wizard. If the virtual network is created by importing a configuration file then the affinity group must be created before the file is imported.

Windows Azure

On June 7, Microsoft released into preview many new features for Windows Azure. The two most significant features announced and now available for trial were:

  • Virtual Machines
  • Web Sites

Until now, the compute services provided by Windows Azure were exclusively platform-as-a-service (PaaS)) which essentially provides a highly-scalable application hosting environment. The Virtual Machines feature is a fully-featured infrastructure-as-a-service (IaaS) offering provides the capability to deploy Windows Server and Linux servers on which enterprise-class software – such as SQL Server 2012 and SharePoint 2010 – can be deployed. Virtual Machines allows a significant expansion of the class of workloads that can be deployed to Windows Azure.

Web Sites provides a scalable, multi-tenanted web site hosting capability that is integrated with a wide variety of developer tools. This significantly broadens the developer support for creating and deploying web sites into Windows Azure, since it avoids the overhead necessary to use web roles for simple web sites. The new Web Sites feature is aimed at applications which do not require the high scalability and sophisticated feature set provided by cloud services.

This is a great time to be working with Windows Azure and either developing green-field applications or migrating existing applications to the platform. You can access a 90-day free trial here. If you are an MSDN subscriber you can access $3,600 of Windows Azure benefits here. You can access the SDKs from here – and these exist for .NET, node.js, PHP, Java and Python.

Microsoft has allied with TechStars to provide an incubator in Seattle, WA for Windows Azure. Applications are open now for the Fall session. The incubator provides $20,000 in seed funding and $60,000 in Windows Azure resources as well as office space, technical training and support.

Posted in Azure, Cloud Service, Storage Service, Virtual Network | Tagged , , | 5 Comments

Microsoft Silverlight 5 and Windows Azure Enterprise Integration

Late last year I was looking for information on Silverlight and Windows Azure and came across a RAW (preview) eBook named Microsoft Silverlight 4 and Windows Azure Enterprise Integration written by David Burela (@DavidBurela) . The eBook looked to be just what I needed so I bought it and found it to be helpful. David has now finished the book which Packt has released as Microsoft Silverlight 5 and Windows Azure Enterprise Integration.

There are a lot of books that go deep into Windows Azure technology, such as my Microsoft Windows Azure Development Cookbook. Going forward I think we will see more books like David’s that show how to integrate Windows Azure with other technologies such as, in this case, Silverlight. The book is targeted at Silverlight developers who want to use Windows Azure to provide back-end services such as scalable storage with the Windows Azure Storage Service and relational storage in SQL Azure.

The primary issue with using client-side technologies to access data stored in Windows Azure is authentication. Both the Windows Azure Storage Service and SQL Azure require that the caller be authenticated using either an authentication token or a password. It is not safe to expose these credentials in client-side code where they could be accessed by a malicious user. This book primarily focuses on how to use Windows Azure compute services to proxy access to secured Windows Azure features from Silverlight.

The book begins with a chapter describing how to get started with Silverlight and Windows Azure and provides information on additional tools that can facilitate development.  The next chapter provides an overview of Windows Azure. Chapter 3 brings the introductory material to a conclusion by showing how to host a Silverlight application in Windows Azure.

The book continues with a sequence of chapters describing the queue, blob and table features of the Windows Azure Storage Service. Each chapter describes the feature and then provides a fully worked out sample showing how to access it from Silverlight. Several chapters provide various ways to access SQL Azure data from Silverlight and, again, come with fully worked out samples. Techniques covered include Entity Framework, RIA Services and OData. Finally the book closes with a few chapters discussing: how to scale-out the Windows Azure service (CQRS); authentication; and the Windows Azure Caching Service.

The coverage of Windows Azure is not as detailed as would be found in a book focused exclusively on Windows Azure. However, I like that the book provides a convenient reference for various ways of accessing secured Windows Azure resources from Silverlight. Indeed, the techniques are general enough to be applicable to any client accessing secured Windows Azure resources.

(Full disclosure) Packt asked if I would be willing to review the book and provided an eBook to allow me to do so. However, as I pointed out earlier, I actually bought my own copy of the book last year. You can never have too many copies of an eBook.

Book: Microsoft Silverlight 5 and Windows Azure Enterprise Integration
Author: David Burela
Publisher: Packt

Posted in Azure, Silverlight, SQL Azure, Storage Service, Windows Azure | Tagged , , , | 2 Comments