A Second Look at Project Orleans

Project Orleans is a preview from Microsoft Research of an Actor-based framework and runtime supporting the development and deployment of massively distributed systems hosted in Microsoft Azure. A specific goal of Orleans is to simplify the creation of distributed systems for developers who are not skilled in the art.

Orleans supports the core features of the Actor model: state encapsulation and safe messaging; fair scheduling; location transparency; and mobility. An Orleans grain (actor) contains fully encapsulated state that may only be changed by the grain itself, in response to a message it receives. Deep copy is used whenever data is inserted into a message. Instead of the default pre-emptive multi-threading .NET scheduler, Orleans uses a cooperative multi-threading scheduler to schedule the processing of messages by a grain and ensures that a message to a grain is completely processed before the next message is processed. Orleans manages the activation of a grain in a silo on a physical node and provides location transparency by completely hiding grain location from the application. Grains are virtual and may or may not be activated in a silo when they are not being used. This allows the Orleans runtime to support weak mobility since at different times the same grain may be activated in different silos.

This is a follow-up to an earlier post which gave a high-level overview of Orleans as well as providing a variety of links to the Orleans system downloads and documentation.

Grains

An Orleans application comprises a system of interacting grains of various types. The application is developed by defining a set of grain interfaces which are then implemented in a set of classes. The Orleans build system auto-generates an associated set of factory and reference classes. The application is deployed through deploying the assemblies hosting the grain implementations to the physical nodes hosting the silos and deploying the assemblies hosting the factory and reference class implementations to the clients, which may or may not be hosting silos. The Orleans runtime completely manages access to grains and clients only ever access grain references, regardless of whether or not the client is hosted in an Orleans silo (i.e., is another grain).

Grain Interfaces

The Orleans API exposes an IAddressable interface, the base for a number of marker interfaces used in the definition of grain classes. In essence, the IAddressable interface indicates the addressability through the Orleans runtime of objects implementing the marker interfaces.

The interface hierarchy for IAddressable is:

IAddressable
   IGrain
   IRemindable
   IGrainObserver

These are declared as follows:

public interface IAddressable {}

public interface IGrain : IAddressable {}

public interface IGrainObserver : IAddressable {}

public interface IRemindable : IGrain, IAddressable {
   Task ReceiveReminder(String reminderName, TickStatus status);
}

IAddressable is an empty marker interface indicating that the Orleans runtime is able to address an instance implementing one of the derived interfaces. IGrain is an empty marker interface indicating that any derived interface is a grain interface. IGrainObserver is an empty marker interface indicating that a derived interface is implemented by an observing class. IRemindable is a marker interface indicating that an implementing class can receive reminders.

IGrain is a core interface for Orleans. Every grain class implements an interface derived from IGrain. These grain interfaces specify the functionality of the grains used in an Orleans application. The messages sent to grains are implemented as public methods and properties in the grain classes. Since Orleans is a distributed systems it is crucial that message processing is asynchronous and this is achieved by constraining grain interfaces so that all methods and properties return either a Task or a Task<T>. The async/await feature of .NET 4.5 greatly simplifies this. Methods can be defined as usual in the grain interface. However, properties must be handled in a special manner since the set method for a property essentially returns a void instead of a Task or Task<T>. Instead, a set method must be provided explicitly.

IGrainObserver is a marker interface indicating that an implementing class is able to observe a grain and process notifications issued by the grain. An observer implements an interface derived from IGrainObserver, the method of which are constrained to return only void. This means that instances of this class are not normal grains, for which the methods can return only Task or Task<T>. An observer must indicate to a grain that it must be notified of particular event, so the observing grain must expose methods supporting that subscription. A grain class can manage these subscriptions using the ObserverSubscriptionManager<T> class, with T being the observer interface. This class is declared as follows:

public class ObserverSubscriptionManager<T> where T : IGrainObserver {
   public ObserverSubscriptionManager<T>();
   public Int32 Count { get; }
   public void Clear();
   public void Notify(Action<T> notification);
   public void Subscribe(T observer);
   public void Unsubscribe(T observer);
}

Subscribe() adds an observer to the list of subscribers to be notified for the specific observable event. Unsubscribe() removes a specific observer from the notification list, while Clear() removes all subscribers. The notification is performed by invoking Notify() and invoking the appropriate notification action on each observing subscriber.

Grain Classes

The hierarchy for the classes implementing IAddressable is:

GrainBase
  GrainBase<TGrainState>
GrainReference

The implementation of a grain is provided by a class derived from either GrainBase or GrainBase<T> and which implements the appropriate grain interface class. GrainBase<T> extends the core GrainBase functionality by adding support for the persistence of grain state of type T, where T is an implementation of IGrainState.

GrainBase is declared as follows:

public abstract class GrainBase : IAddressable {
   protected GrainBase();
   public String IdentityString { get; }
   public String RuntimeIdentity { get; }
   public virtual Task ActivateAsync();
   public virtual Task DeactivateAsync();
   protected void DeactivateOnIdle();
   protected void DelayDeactivation(TimeSpan timeSpan);
   protected OrleansLogger GetLogger(String loggerName);
   protected OrleansLogger GetLogger();
   protected Task GetReminder(void reminderName);
   protected Task GetReminders();
   protected IStreamProvider GetStreamProvider(String name);
   protected IEnumerable<IStreamProvider> GetStreamProviders();
   protected Task RegisterOrUpdateReminder(void reminderName,
      IOrleansReminder dueTime, String period);
   protected IOrleansTimer RegisterTimer(Func asyncCallback,
      Boolean state, Object dueTime, Task period);
   protected Task UnregisterReminder(IOrleansReminder reminder);
}

IdentityString opaquely identifies the grain and RuntimeIdentity opaquely identifies the silo hosting it. ActivateAsync() is invoked each time the grain is activated – i.e., rehydrated into memory – and may be overridden to provide any additional initialization required. Similary, DeactivateAsync() is invoked each time the grain is deactivated and may be overridden (e.g., to persist grain state). DeactivateAsync() indicates that the grain should be deactivated as soon as the current request has completed. DelayDeactivation() hints that the grain should remain activated for the specified timespan. GetLogger() gets the Orleans logger which can be used to write entries to the Orleans log. In Azure, this log is persisted into the standard Azure logs provided by (Windows) Azure Diagnostics so may be accessed in the WADLogsTable in Azure Storage.

Orleans provides a timer capability, in which a timer can be created and associated with a grain. When this timer fires a method is invoked on the grain. It may be used, for example, to ensure that grain state is persisted periodically. This timer exists only while the grain is activated, and is cancelled whenever the grain is deactivated. The RegisterTimer() method is used to create a timer and specify the Task to be invoked when it fires. A timer is cancelled by disposing the handle returned by the RegisterTimer() method.

The Orleans reminder feature provides the capability of a timer which transcends grain lifetime. It does this by storing the reminder state either in-memory on the Silo (useful for development) or in an Azure Table. The latter is a distributed, persistent store, and its use allows a reminder to be sent even when a grain has been activated in another silo. The RegisterOrUpdateReminder() method is used to create or update a reminder, which is subsequently identified by name. For a grain to receive a reminder its class must implement the IRemindable interface. This interface exposes a ReceiveReminder() method which is invoked when the reminder is sent. The GetReminders() method returns all the reminders for the grain while the GetReminder() returns a reminder by name. UnregisterReminder() is used to delete a reminder.

The Orleans team has indicated that GetStreamProvider() and GetStreamProviders() should not have been exposed in the preview.

Persistent Grains

Orleans supports the persistence of grain state through the use of the GrainBase<TGrainState> class, where TGrainState is the class containing the data to be persisted. The grain state is persisted using a storage provider configured in the Orleans configuration file. When a grain is activated it is automatically initialized with its persistent state prior to the invocation of ActivateAsyn(), which can then be used to complete initialization (e.g., initialize any non-persisted state). However, the grain state is never persisted automatically so some strategy must be devised for grain state persistence.

In many Orleans systems the true grain state is actually resident on a client (for example, on an XBox controller) so can always be refreshed from there. Consequently, it is not necessarily crucial that the grain state be persisted whenever it is changed on the grain. It can be persisted occasionally using a timer or reminder and using DeactivateAsync() when the grain is deactivated. This deferred persistence writing helps improve performance.

Orleans provides in-memory and Azure Table storage providers – the former being suitable for development while the latter supports a production system. Orleans provides an extension point for creating storage providers. The Orleans team has provided a sample on CodePlex and Richard Astbury has created a GitHub repo with an Azure Blob storage provider.

GrainBase<T> is declared as follows:

public class GrainBase<TGrainState> :
      GrainBase, IAddressable where TGrainState : IGrainState {
   public GrainBase<TGrainState>();
   protected TGrainState State { get; }
}

The persistent state is accessed through the State property, which is strongly typed allowing its members to be accessed using property dot notation (e.g., State.LastName). A class derived from GrainBase<T> can have state not contained in State, but this is not persisted using the state persistence capability. This additional state can be managed in various ways including through the use of the ActivateAsyn() and DeactivateAsync() methods on GrainBase.

The TGrainState interface is derived from the IGrainState interface, which is declared as follows:

public interface IGrainState {
   String Etag { get; set; }
   Dictionary<String,Object> AsDictionary();
   Task ClearStateAsync();
   Task ReadStateAsync();
   void SetAll(Dictionary<String,Object> values);
   Task WriteStateAsync();
}

ClearStateAsync() is used to clear the current state. ReadStateAsync() is used to refresh the State property from the configured storage provider. WriteStateAsync() is used to persist the State property to the configured storage provider. AsDictionary() is used by state providers to expose the state as a Dictionary. SetAll() is used by storage providers to initialize the State property. ETag is an opaque value used by the storage provider.

Client Implementation

The Orleans server implementation comprises the grain interface and the grain class. The Orleans build system auto-generates a factory class and a reference class for each grain interface – and these are used by clients regardless of whether or not they are hosted in an Orleans silo. The factory class exposes static methods for creating grain references. The reference class implements the grain interface and the Orleans runtime proxies method invocations as messages to the actual grain.

The factory class implements methods like the following (where ISampleGrain is the grain interface):

public static ISampleGrain GetGrain(Guid primaryKey)
public static ISampleGrain GetGrain(long primaryKey)

These are used by clients to create grain references for the grain identified by the specified primary key. Note that this is purely a local operation and does not in itself cause the activation of a grain; that requires the invocation of a grain method.

The grain reference nominally has the type of the grain interface. It is actually an implementation of an auto-generated class derived from GrainReference and which implements the grain interface.

Example – Grain Interface

The following is a simple example of a grain interface:

public interface IPersonGrain : IGrain {
   Task<String> Name { get; }
   Task SetName(String name);
}

This example shows the use of a property getter with a standard method used instead of a property setter. As required all methods in the interface return either Task or Task<T>.

Example – State Interface

The following is a simple example of a state interface that persists only a single property:

public interface IPersonState : IGrainState {
   String Name { get; set; }
}

Example – Grain Class

The following is a simple example of a grain class implementing IPersonGrain and using the built-in grain persistence. Orleans loads state automatically on grain activation but grain state must be explicitly performed – in this case when the grain is deactivated.

[StorageProvider(ProviderName = “AzureStore”)]
class PersonGrain : GrainBase<IPersonState>, IPersonGrain {
   public Task<String> Name {
      get { return Task.FromResult(State.Name); }
   }

   public Task SetName(string name) {
      State.Name = name;
      return TaskDone.Done;
   }

   public override Task DeactivateAsync() {
      State.WriteStateAsync();
      return base.DeactivateAsync();
   }
}

The storage provider, AzureStore, is configured in the OrleansConfiguration.xml file.

Some Useful Techniques for Using Tasks

The Task class provides the following convenient way to create a completed task for a specific value:

Task.FromResult(value);

The Orleans API provides the following utility property to return a completed Task.

TaskDone.Done;

The following example shows the use of Task.WhenAll() to fan-out the sending of messages allowing them to be processed simultaneously:

List<Task> promises = new List<Task>();
for (Int32 i = 0; i < 10; i++) {
   var personGrain = PersonGrainFactory.GetGrain(i);
   promises.Add(personGrain.SetName(
      String.Format(“John-{0}”, name, i)));
}
await Task.WhenAll(promises);

Summary

The Orleans framework and runtime provides an easy-to-use implementation of the Actor model for the .NET platform. The definition of an actor (or grain) requires the creation of a grain interface derived from IGrainInterface and its implementation in a class derived from GrainBase or GrainBase<T>, where T is an interface identifying data the persistence of which is handled automatically.  Given the ease with which grains can be defined and the transparent manner in which the Orleans runtime allocates them to physical nodes, Orleans simplifies the development of certain classes of distributed systems.

Posted in Azure, Orleans | Tagged , | 2 Comments

A First Look at Project Orleans

Microsoft Azure Cloud Services is a PaaS offering that simplifies the task of deploying scalable applications. An Azure PaaS deployment comprises two files: a package containing the application assemblies; and a configuration file. This simplicity makes Azure Cloud Services a great environment for deploying scalable applications. However, the developer remains responsible for ensuring that the application functions well in the distributed environment provided by Azure Cloud Services.

Microsoft Research developed Project Orleans with the specific goal of simplifying the creation of scalable applications for “developers who are not distributed system experts.” Orleans is an implementation of an Actor model, using the constraints imposed by that model to reduce the complexity of developing a distributed system. This simplifies the development of certain classes of distributed system.

Orleans combines an application framework with a service runtime. The application framework abstracts away certain elements that complicate the development of distributed systems. The service runtime provides a simple model that supports application deployment into various environments from a single PC up to an Azure Cloud Service. An Orleans application is a composite of a client application (e.g., a website) and an Orleans server application hosted by the runtime. Orleans is in preview but is currently used to provide some high-scale, backend services, hosted in Azure, for games such as Halo 4.

The Orleans preview can be downloaded from Microsoft Connect. This download comprises the application framework, including Visual Studio tooling, and the Orleans runtime. The Orleans documentation along with various samples are hosted on CodePlex.

Microsoft Research hosts the home page for Orleans. The Orleans team has written a very readable research report that describes the Orleans architecture in some depth (be sure to read the 2014 version). Hoop Somuah (@hoopsomuah) and Sergey Bykov (@sbykov_work) did a Build 2014 presentation on Using Orleans to Build Halo 4’s Distributed Cloud Service. Richard Asbury (@richorama) talks about Orleans in a .Net Rocks Podcast. Caitie McCaffrey (@CaitieM20) has a post on Creating RESTful services using Orleans.

The rest of this post is a high-level look at some Orleans features. A follow-up post goes deeper into Orleans..

Actor Model

The Actor model was introduced in 1973 by Hewitt, Bishop and Steiger. In this model, an actor is the fundamental primitive for concurrent compute providing processing, state and communication. A system comprises many actors, which interact by sending messages to each other. An actor encapsulates state and data is not shared between actors. An actor processes a message in the following ways:

  • Create new actors
  • Send messages to other actors
  • Designate how to handle the next message it receives

The intent is that an actor is a simple entity with message processing being a manifestly concurrent operation that may change the internal state of the actor (and thereby the way subsequent messages are handled). A complex system is built through the interaction of many actors. By ensuring the concurrency of basic message processing, the actor model simplifies the creation of sophisticated distributed systems where concurrency can often cause significant problems.

Hewitt describes the Actor model in this recent paper. There is an excellent video on Microsoft Channel 9, in which Carl Hewitt discusses the Actor model with Erik Meijer, and Clemens Szyperski.

Orleans

Grain

Orleans is an implementation of the Actor model. In Orleans, an actor is referred to as a grain and the runtime host for grains is referred to as a silo. A grain is an instance of a .NET class implementing a marker grain interface. In a distributed Orleans system, there is a silo on each server hosting the Orleans runtime. An individual grain can be hosted in any of the Orleans silos, but the runtime provides location transparency so the user does not know which silo holds a particular grain.

Grain Lifetime

The Orleans runtime implements a model in which grains are deemed to have an eternal, but virtual, life. A grain is deemed active when it is physically resident in a silo and is otherwise deemed inactive. When a request for a grain is made, the runtime either returns a reference to a grain already activated in some silo or silently activates a grain in a silo and returns a reference to it. The caller is completely isolated from grain activation. This allows the runtime to manage resources efficiently by silently deactivating grains that have not been used for a while.

By default, the Orleans runtime does not provide affinity between a grain and a silo. This flexibility allows the runtime to hydrate a grain into any silo, and this allocation may change through the virtual lifetime of the grain. The runtime uses an internal discovery service to identify the silo containing a grain. The discovery service uses a distributed hash table located on each silo to store the identity of the silo currently containing a grain. As an optimization, each silo has a local cache which stores the location of recently-accessed grains.

Since a grain can be in any silo it is always accessed through a reference provided by the Orleans runtime. When a message is sent to a grain the runtime:

  • makes a deep copy of it
  • serializes it using a specialized binary serializer
  • transmits it to the correct silo
  • deserializes it
  • queues it for processing by the receiving grain.

Orleans completes the message-sending process by invoking a method on the receiving grain. The message invocation results in a promise that may or may not complete successfully, that is the promise may be fulfilled or it may be broken. The promise is implemented using the .NET Task classes, and is greatly simplified through use of the .NET 4.5 async/await feature. Message passing is an asynchronous operation so the caller is not immediately aware of the success or failure of the method invocation. This asynchrony is crucial to the scalability of Orleans, since the runtime is able to schedule invocation without blocking the caller.

A grain reference can be used either inside another grain hosted in an Orleans server application or in a client application hosted outside the Orleans runtime.

Grain Implementation

A grain is defined through the specification of an interface and the creation of a class implementing it. Orleans build-time tooling automatically generates a factory class for each grain class allowing references to grains of it to be retrieved.

The grain interface is derived from IGrain. The interface comprises one or more public methods returning either a Task or a Task<T>. A message to a grain corresponds to the invocation of one of the grain interface methods.The Orleans runtime handles the transfer of the method invocation from the sending grain through the messaging infrastructure to the receiving grain which, in a distributed system, is likely to be in a silo on another server.

The Orleans grain implementation is defined by creating a class derived from GrainBase that implements the interface. There is no need to define a constructor for the class, since auto-generated factory methods are used to create references to grains. One or both of ActivationAsync() and DeactivationAsync() methods can be defined to contain any specific grain activation and deactivation code.

The Orleans framework tooling creates the factory class for each grain type, and this process generates an error when it detects an invalid grain interface. The factory class exposes a set of factory methods used to retrieve a grain reference. Note that retrieving a reference does not lead to grain activation, which is done only when a message is sent to the grain.

Each grain is identified by its type and primary key, which is either a GUID or an Int64. Internally, the latter is zero-padded into the former. (It is also possible to declare a grain type with an extended primary key that includes a String.) By default, each specific grain is a singleton but it is possible to declare a stateless worker grain that the Orleans runtime can scale out automatically.

Orleans Runtime

The Orleans runtime schedules work as a sequence of turns, with a turn being the execution of a grain method up to the time a promise has been received (e.g., reaching an await statement; the closure following an await statement; or the return of a completed or uncompleted Task). To avoid concurrency problems, each grain is single-threaded so that only one turn is executed on a grain at any one time. A single request may result in several turns and, by default, the runtime processes all the turns for a request before processing any other requests for the same grain. Orleans provides high-scale by hosting many grains on a single server, so that even though request handling on each grain is single threaded the handling of individual requests on many grains is performant.

Orleans uses a purpose-built scheduler that provides cooperative multitasking instead of the preemptive multitasking of the standard .NET Task scheduler. For the turn-based scheduling used by Orleans this provides for much more efficient use of system resources than preemptive multitasking for the Orleans runtime.

Grain Persistence

The Orleans runtime can load persistent grain state on activation. This is independent of the ActivationAsync() method. For performance reasons grain state is not persisted automatically, instead state persistence for the grain must be explicitly managed by the grain implementation.

The Orleans support for grain state persistence is implemented by creating a class derived, from IGrainState, to hold that state. The grain class implementation must be derived from GrainBase<T> (instead of GrainBase), and implement the grain interface, where T is the class holding the grain state. The grain class can have additional state, stored in non-persisted private members, that can be initialized using ActivationAsync(). The IGrainState interface exposes WriteStateAsync() and ReadStateAsync() methods that are used to persist grain state and refresh grain state from the persistent store.

Orleans has the concept of pluggable storage providers to support grain state persistence. The storage provider is specified in the Orleans server and client configuration files. Several storage providers ship in the preview: LocalMemory is a development provider using local memory; AzureStorage persists grain state in Azure Tables (either cloud or development storage). Orleans provides a relatively simple extension point allowing the creation of additional storage providers. One of the samples demonstrates how to do this. Richard Astbury has published a storage provider using Azure Blob Storage.

Visual Studio Tooling

The Orleans framework provides three Visual Studio project templates for Orleans:

  • Orleans Dev/Test Host – creates a console app with an Orleans silo for development purposes
  • Orleans Grain Class Collection – contains the grain class implementations
  • Orleans Grain Interface Collection – contains the grain interfaces

The Orleans build tooling creates the grain factory classes used to access grain references. The files for these classes are located in the Properties\orleans.codegen.cs file under the interfaces directory for the project.

Deployment

Orleans can be deployed locally for dev/test purposes. It can also be deployed into group of local servers. However, a scalable system should be deployed into Azure.

A common Azure deployment is to host the Orleans server application in an Azure worker role and the client application in an Azure web role. The Orleans framework makes it trivial to deploy an Orleans server into an Azure worker role. Indeed, there is a one-to-one match between an Orleans runtime method and the Azure RoleEntryPoint overrides. The Orleans runtime is able to handle the scaling out of the worker role instances. In an Azure deployment, the Orleans runtime uses Azure Tables to store runtime information.

Summary

The development and deployment of scalable distributed systems is difficult. Project Orleans provides an application framework and runtime support that simplifies the creation of those distributed systems that can be implemented using an Actor model. Orleans is specifically designed to simplify the creation of distributed systems by developers who are not experts in distributed systems. It is also designed to play well with Azure, and clearly demonstrates the benefit of developing cloud-native applications for Azure Cloud Services. Orleans comes with Visual Studio tooling, documentation, and samples which make it easy to learn how to use it.

Posted in Azure, Orleans | Tagged , | Leave a comment

Windows Azure Training Events– San Francisco Bay Area

In March and April 2014, Satory Global is hosting several Developer Camps focused on Windows Azure and Modern apps. These are in San Francisco, CA and Sunnyvale, CA.

The camps are a mixture of presentations and hands-on labs, where you will get the opportunity to learn and try out various aspects of Windows Azure and how Modern apps can use it as a backend.

Windows Azure Developer Camp:
Make It Happen In The Cloud
(Register)
Date:  March 6, 2014
Time:  8:30-5:00
Location:
Microsoft
1010 Enterprise Way
Building B
Sunnyvale, California 94089

Windows Azure Developer Camp:
Make It Happen In The Cloud
(Register)
Date:  April 9, 2014
Time:  8:30-5:00
Location: Microsoft
835 Market Street
Suite 700,
San Francisco, California 94103
Developer Camp:
Extending Your Existing Apps On The Microsoft Modern Platform
(Register)
Date:  April 29, 2014
Time:  8:30-5:00
Location:
Microsoft
1010 Enterprise Way
Building B
Sunnyvale, California 94089

I hope to see you there.

Posted in Training, Windows Azure | Tagged | Leave a comment

Queries in the Windows Azure Storage Client Library v2.1

Windows Azure Storage has been a core part of the Windows Azure Platform since the public preview in 2008. It supports three storage features: Blobs, Queues and Tables. The Blob Service provides high-scale file storage – with prominent uses being: the storage of media files for web sites; and the backing store for the VHDs used as the disks attached to Windows Azure VMs. The Queue Service provides a basic and easy-to-use messaging system that simplifies the disconnected communication between VMs in a Windows Azure cloud service. The Table Service is a fully-managed, cost-effective, high-scale, key-value NoSQL datastore.

The definitive way to access Windows Azure Storage is through the Windows Azure Storage REST API. This documentation is the definitive source of what can be done with Windows Azure Tables. All client libraries, regardless of language, use the REST API under the hood. The Storage team has provided a succession of .NET libraries that sit on top of the REST API. The original Storage Client library had a strong dependence on WCF Data Services, which affected some of the design decisions. In 2012, the Storage team released a completely rewritten v2.0 library which removed the dependence and was more performant.

The Storage Client v2.0 library provided a fluent library for query invocation against Wizard Azure Tables. v2.1 of the library added a LINQ interface for query invocation. The LINQ interface is significantly easier to use than the fluent library while supporting equivalent functionality. Consequently, only the LINQ library is considered in this post.

The MSDN documentation for the Windows Azure  Storage Library is here. The Windows Azure Storage team has provided several posts documenting the Table Service API in the .NET Storage v2.x library (2.02.1). Gaurav Mantri has also posted on the  Table Service API as part of an excellent series of posts on the Storage v2.0 library. I did a post in 2010 that described the query experience for Windows Azure Tables in the Storage Client v1.x library.

Overview of Windows Azure Tables

Windows Azure Tables is a key-value, NoSQL datastore in which entities are stored in tables.  It is a schema-less datastore so that each entity in a table can have a different schema for the properties contained in it. The primary key, and only index, for a table is a combination of the PartitionKey and RowKey that must exist in each row. The PartitionKey specifies the partition (or shard) for an entity while the RowKey provides uniqueness within a partition. Different partitions may be stored on different physical nodes, with the Table Service managing this allocation.

The REST API provides limited query capability. It supports filtering as well as the specification of the properties to be returned. A query can be filtered on combinations of any of the properties in the entity. The right side of each filter must be against a constant, and it is not possible to compare values of different properties. It is also possible to specify a limit on the number of entities to be returned by a query. The general rules for queries are documented here with specific rules for filters provided here.

The Table Service uses server-side paging to limit the amount of data that may be returned in a single query. Server-side paging is indicated by the presence of a continuation token in the response to the query. This continuation token can be provided in a subsequent query to indicate where the next “page” of data should start. A single query has a hard limit of 1,000 entities in a single page and can execute for no more than 5 seconds. Furthermore, a page return is also caused by all the entities hosted on a single physical node having been returned. An odd consequence of this is that it is possible to get back zero entities along with a continuation token. This is caused by the Table Service querying a physical node where no entities are currently stored. The only query that is guaranteed never to return a continuation token is one that filters on both PartitionKey and RowKey. Note that any query which does not filter on either PartitionKey or RowKey will result in a table scan.

The Table Service returns queried data as an Atom feed. This is a heavyweight XML protocol that inflates the size of a query response. The Storage team announced at Build 2013 that it would support the use of JSON in the query response which should reduce the size of a query response. To reduce the amount of data returned by a query, the Table Service supports the ability to shape the returned data through the specification of which properties should be returned for an entity.

The various client libraries provide a native interface to the Table Service that hides much of the complexity of filtering and continuation tokens. For example, the .NET library provides both fluent and LINQ APIs allowing a familiar interaction with Windows Azure Tables.

ITableEntity

The ITableEntity class provides the interface implemented by all classes used to represent entities in the Storage Client library v2.x. ITableEntity defines the following properties:

  • ETag– entity tag used for optimistic concurrency
  • PartitionKey – partition key for the entity
  • RowKey – row key for the entity
  • Timestamp – timestamp for last update

The library contains two classes implementing the ITableEntity interface:

TableEntity provides the base class for user-defined classes used to represent entities in the Storage Client library. These derived classes expose properties representing the properties of the entity. DynamicTableEntity is a sealed class which stores the entity properties inside an IDictionary<String,EntityProperty> property named Properties.

The use of strongly-typed classes derived from TableEntity is useful when entities are all of the same type. However, DynamicTableEntity is helpful when handling tables which take full advantage of the schema-less nature of Windows Azure Tables and have entities with different schemas in a single table.

Basic LINQ Query

LINQ is a popular method for specifying queries since it provides a natural syntax that makes explicit the nature of the query and the shape of returned entities. The Storage Client library supports LINQ and uses it to expose various query features of the underlying REST interface to .NET.

A LINQ query is created using the CreateQuery<TEntity>() method of the CloudTable class. TEntity is a class derived from ITableEntity. In the query the where keyword specifies the filters and the select keyword specifies the properties to be returned.

With BookEntity being an entity class derived from TableEntity, the following is a simple example using the Storage Client library of a LINQ query against a table named book:

The definition of the query creates an IQueryable<BookEntity> but does not itself invoke an operation against the Table service. The operation actually occurs when the foreach statement is invoked. This example queries the book table and returns all entities where the PartitionKey property takes the value hardback.  A table is indexed on PartitionKey and RowKey and data is returned from a query ordered by PartitionKey/RowKey. In this example the data is ordered by RowKey since the PartitionKey is fixed in the query.

The Storage Client library handles server-side paging automatically when the query is invoked. Consequently, if there are many entities satisfying the query a significant amount of data is returned.

Basic queries can be performed using IQueryable, as above. More sophisticated queries – with client-side paging and asynchronous invocation, for example – are handled by converting the IQueryable into a TableQuery.

Server Side Paging

The Storage library supports server-side paging using the Take() method. This allows the specification of the maximum number of entities that should be returned by query invocation. This limit is performed server-side so can significantly limit the amount of data returned and consequently the time taken to return the data.

For example, the above query can be modified to return a maximum of 10 entities:

Note that this simple query can not by itself be used to page through the data from the client side. Multiple invocations always return the same entities. Paging through the data requires the handling of the continuation tokens returned by the server to indicate that there is additional data satisfying the query.

To handle continuation tokens, the IQueryable must be cast into a TableQuery<TElement>. This can be done either through direct cast or using the AsTableQuery() extension method. TableQuery<TElement> exposes an ExecuteSegmented() method which handles continuation tokens:

This method invokes a query in which the result set starts with the entity indicated by an(opaque) continuation token, which should be null for the initial invocation. It also takes optional TableRequestOption (timeouts and retry policies) and OperationContext (log level) parameters. TableQuerySegment<TElement> is an IEnumerable that exposes two properties: ContinuationToken and Results. The former is the continuation token, if any, returned by query invocation while the latter is a List of the returned entities. A null value for the returned ContinuationToken indicates that all the data has been provided and that no more query invocations are needed.

The following example demonstrates the use of continuation tokens:

The TableQuery class also exposes various asynchronous methods. These include traditional APM methods of the form BeginExecuteSegmented()/EndExecuteSegmented() and modern Task-based (async/await) methods of the form ExecuteSegmentedAsync().

Extension Methods

The TableQueryableExtensions class in the Microsoft.WindowsAzure.Storage.Table.Queryable namespace provides various IQueryable<TElement> extension methods:

AsTableQuery() casts a query to a TableQuery<TElement>. Resolve() supports client-side shaping of the entities returned by a query. WithContext() and WithOptions() allow an operation context and request options respectively to be associated with a query.

Entity Shaping

The Resolve() extension method associates an EntityResolver delegate that is invoked when the results are serialized. The resolver can shape the output of the serialization into some desired form. A simple example of this is to perform client-side modification such as creating a fullName property out of firstName and lastName properties.

The EntityResolver delegate is defined as follows:

The following example shows a query in which a resolver is used to format the returned data into a String composed of various properties in the returned entity:

More sophisticated resolvers can be defined separately. For example, the following example shows the returned entities being shaped into instances of a Writer class:

Schema-Less Queries

The DynamicTableEntity class is used to invoke queries in a schema-free manner, since the retrieved properties are stored in a Dictionary. For example, the following example performs a filter using the DynamicTableEntity Properties collection and then puts the returned entities into a List of DynamicTableEntity objects:

The individual properties of each entity are accessed through the Properties collection of the DynamicTableEntity.

Summary

The Windows Azure Storage Client v2.1 library supports the use of LINQ to query a table stored in the Windows Azure Table Service. This library exposes, in a performant .NET library, all the functionality provided by the underlying Windows Azure Storage REST API.

Posted in Storage Service, Windows Azure | Tagged , | 9 Comments

Silicon Valley Code Camp–Windows Azure IaaS Presentation

The Silicon Valley Code Camp, organized by Peter Kellner (@pkellner), takes place in the first weekend in October each year in the nice environment of the Foothills College campus. This year, 4492 people registered to attend and there were 229 sessions.

As part of the Windows Azure track I did a talk providing an overview of Windows Azure Virtual Machines – the IaaS offering in the Windows Azure platform. The deck is available on Slideshare.

Posted in IaaS, Virtual Machines, Windows Azure | Tagged , , | Leave a comment

Windows Azure Developer Camps in Northern California

Satory Global is conducting two free one-day, instructor-led training events in the Windows Azure Developer Camps series put on by Microsoft. These events provide a great opportunity to get started with Windows Azure through a mixture of presentations and hands-on labs.

We will start with the basics and build on to more advanced topics, featuring instructor led hands-on labs for:
–  Windows Azure Web Sites and Virtual Machines using ASP.NET & SQL Server
–  Deploying Cloud Services in Windows Azure
–  Exploring Windows Azure Storage for Visual Studio 2012

November 6, Sunnyvale, CA

Silicon Valley Moffett Towers (Map)
1020 Enterprise Way
Building B
Sunnyvale
California 94089
United States

More details and registration for the Sunnyvale event is here.

November 7, San Francisco, CA

Microsoft Office (Map)
835 Market Street
Suite 700
San Francisco
California 94103

More details and registration for the San Francisco event is here.

Posted in Storage Service, Windows Azure | Tagged , | Leave a comment

Introduction to Windows Azure Media Services

Windows Azure Media Services (WAMS) is a PaaS offering that makes it easy to ingest media assets, encode them and then perform on-demand streaming or downloads of the resulting videos.

The WAMS team has been actively proselytizing features as they become available. Mingfei Yan (@mingfeiy) has a number of good posts and she also provided the WAMS overview at Build 2013. Nick Drouin has a nice short post with a minimal demonstration of using the WAMS SDK to ingest, process and smooth stream a media asset. John Deutscher (@johndeu) has several WAMS posts on his blog including an introduction to the MPEG DASH preview on WAMS. Daniel Schneider and Anthony Park did a Build 2013 presentation on the MPEG DASH preview.

Windows Azure Media Services is a a multi-tenant service with shared encoding and shared on-demand streaming egress capacity. The basic service queues encoding tasks to ensure fair distribution of compute capacity and imposes a monthly egress limit for streaming. Encoding is billed depending on the quantity of data processed, while streaming is billed at the standard Windows Azure egress rates. It is possible to purchase reserved units for encoding to avoid the queue – with each reserved unit being able to perform a single encoding task at a time (additional simultaneous encoding tasks would be queued). It is also possible to purchase reserved units for on-demand streaming – with each reserved unit providing an additional 200Mbps of egress capacity. Furthermore, the Dynamic Packaging for MPEG-DASH preview is available only to customers which have purchased reserved units for on-demand streaming.

The entry point to the WAMS documentation is here. The Windows Azure Media Services REST API is the definitive way to access WAMS from an application. The Windows Azure Media Services SDK is a .NET library providing a more convenient way to access WAMS. As with most Windows Azure libraries, Microsoft has deployed the source to GitHub. The SDK can be added to a Visual Studio solution using NuGet.

The Windows Azure SDK for Java also provides support for WAMS development. The Developer tools for WAMS page provides links to these libraries as well as to developer support for creating on-demand streaming clients for various environments including Windows 8, Windows Phone, iOS and OSMF.

The Windows Azure Portal hosts a getting started with WAMS sample. The Windows Azure Management Portal provides several samples on the Quick Start page for a WAMS account.

Windows Azure Media Services Account

The Windows Azure Management Portal provides a UI for managing WAMS accounts, content (assets), jobs, on-demand streaming and media processor. A WAMS account is created in a specific Windows Azure datacenter. Each account has an account name and account key, that the WAMS REST API (and .NET API) uses to authenticate requests. The account name also parameterizes the namespace for on-demand streaming (e.g., http://MyMediaServicesAccount.origin.mediaservices.windows.net).

Each WAMS account is associated with one or more Windows Azure Storage accounts, and are used to store the media assets controlled by the WAMS account. The association of a storage account allows the WAMS endpoint to be used as a proxy to generate Windows Azure Storage shared-access signatures that can be used to authenticate asset uploads and downloads from/to a client without the need to expose storage-account credentials to the client.

Workflow for Handling Media

The workflow for using WAMS is:

  1. Setup – create the context used to access WAMS endpoints.
  2. Ingestion – upload one or more media files to Windows Azure Blob storage where they are referred to as assets.
  3. Processing – perform any required process, such as encoding, to create output assets from the input assets.
  4. Delivery – generate the locators (URLs) for delivery of the output assets as either downloadable files or on-demand streaming assets.

Setup

WAMS exposes a REST endpoint that must be used by all operations accessing the WAMS account. These operations use a WAMS context that manages authenticated access to WAMS capabilities. The context is exposed as an instance of the CloudMediaContext class.

The simplest CloudMediaContext constructor for this class takes an account name and account key. Newing up a CloudMediaContext causes the appropriate OAuth 2 handshake to be performed and the resulting authentication token to be stored in the CloudMediaContext instance. Behind the scenes, the initial connection is against a well-known endpoint (https://media.windows.net/), with the response containing the the actual endpoint to use for this WAMS account. The CloudMediaContext constructor handles with initial authentication provided by the WAMS account name and account key and subsequent authentication provided by an OAuth 2 token.

CloudMediaContext has a number of properties, many of which are IQueryable collections of information about the media services account and its current status including:

  • Assets – an asset is a content file managed by WAMS.
  • IngestManifests – an ingest manifest associates a list of files to be uploaded with a list of assets.
  • Jobs – a job comprises one or more tasks to be performed on an asset.
  • Locators – a locator associates an asset with an access policy and so provides the URL with which the asset can be accessed securely.
  • MediaProcessors – a media processor specifies the type of configurable task that can be performed on an asset.

These are “expensive” to populate since they require a request against the WAMS REST API so are populated only on request. For example, the following retrieves a list of jobs created in the last 10 days:

The filter is performed on the server, with the filter being passed in the query string to the appropriate REST operation. Documentation on the allowed query strings seems light.

Note that repopulating the collections requires a potentially expensive call against the WAMS REST endpoint. Consequently, the collections are not automatically refreshed. Accessing the current state of a collection – for example, to retrieve the result of a job – may require newing up a new context to access the collection.

Ingestion

WAMS tasks perform some operation that converts an input asset to an output asset. An asset comprises one or more files located in Windows Azure Blob storage along with information about the status of the asset. An instance of an asset is contained in a class implementing the IAsset interface which exposes properties like:

  • AssetFiles – the files managed by the asset.
  • Id – unique Id for the asset.
  • Locators – a locator associates an asset with an access policy and so provides the URL with which the asset can be accessed securely.
  • Name – friendly name of the asset.
  • State – current state of the asset (initialized,  published, deleted).
  • StorageAccountName – name of the storage account in which the asset is located.

The ingestion step of the WAMS workflow does the following:

  • creates an asset on the WAMS server
  • associates files with the asset
  • uploads the files to the Windows Azure Blob storage

The asset maintains the association between the asset Id and the location of the asset files in Windows Azure Blob storage.

WAMS provides two file uploading techniques.

  • individual file upload
  • bulk file ingestion

Individual file upload requires the creation of an asset and then a file upload into the asset. The following example is a basic example of uploading a file to WAMS:

WAMS uses the asset as a logical container for uploaded files. In this example, WAMS creates a blob container with the same name as the asset.Id and then uploads the media file into it as a block blob. The asset provides the association between WAMS and the Windows Azure Storage Service.

This upload uses one of the WAMS methods provided to access the Storage Service. These methods provide additional functionality over that provided in the standard Windows Azure Storage library. For example, they provide the ability to track progress and completion of the upload.

When many files must be ingested an alternative technique is to create an ingestion manifest, using a class implementing the IIngestManifest interface, providing information about the files to be uploaded. The ingest manifest instance then exposes the upload URLs, with a shared access signature, which can be used to upload the files using the Windows Azure Storage API.

Note that the asset Id is in the form: nb:cid:UUID:ceb012ff-7c38-46d5-b58b-434543cd9032. The UUID is the container name which will contain all the media files associated with the asset.

Processing

WAMS supports the following ways of processing a media asset:

  • Windows Azure Media Encoder
  • Windows Azure Media Packager
  • Windows Azure Media Encryptor
  • Storage Decryption

The Windows Azure Media Encoder takes an input media asset and performs the specified encoding on it to create an output media asset. The input media asset must have been uploaded previously. WAMS supports various file formats for audio and video, and supports many encoding techniques which are specified using one of the Windows Azure Media Encoder presets. For example, the VC1 Broadband 720P preset creates a single Windows Media file with 720P variable bit rate encoding while the VC1 Smooth Streaming preset produces a Smooth Streaming asset comprising a 1080P video with variable bit rate encoding at 8 bitrates from 6000 kbps to 400kbps. The format for the names of output media assets created by the Windows Azure Media Encoder is documented here.

The Windows Azure Media Packager provides an alternate method to create Smooth Streaming or Apple Http Live Streaming (HLS) asset. The latter cannot be created using the Windows Azure Media Encoder. Rather than use presets, the Windows Azure Media Packager is configured using an XML file.

The Windows Azure Media Encryptor is used to manage the encryption of media assets, which is used in the digital rights management (DRM) of output media assets. The Windows Azure Media Encryptor is configured using an XML file.

Windows Azure Storage Encryption is used to decrypt media assets.

Media assets are processed by the creation of a job comprising one or more tasks. Each task uses one of the WAMS processing techniques described above. For example, a simple job may comprise a single task that performs a VC1 Smooth Streaming encoding task to create the various output media files required for the smooth streaming of an asset.

For example, the following sample demonstrates the creation and submission of a job comprising a single encoding task.

This sample creates a job with some name on the WAMS context. It then identifies an appropriate WAMS encoder and uses that to create a VC1 Broadband 720p encoding task which is added to the job. Then, it identifies an asset already attached to the context, perhaps the result of a prior ingesting on it, and adds it as an input to the task. Finally, it adds a new output asset to the task and submits.

When completed, the output asset files will be stored in the container identified by the asset Id for the output asset of the task. There are two files created in this sample:

  • SomeFileName_manifest.xml
  • SomeFileName_VC1_4500kbps_WMA_und_ch2_128kbps.wmv

The manifest XML file provides metadata – such as bit rates – for the audio and video tracks in the output file.

Delivery

WAMS supports both on-demand streaming and downloads of output media assets. The files associated with an asset are stored in Windows Azure Blob Storage and require appropriate authentication before they can be accessed. Since the processed files are typically intended for wide distribution some means must be provided whereby they can be accessed without the need to share the highly-privileged account key with the users.

WAMS provides different techniques for accessing the files depending on whether they are intended for download or smooth streaming. It uses and provides API support for the standard Windows Azure Storage shared-access signatures for downloading media files to be downloaded. For streaming media it hosts an endpoint that proxies secured access to the files in the asset.

For both file downloads and on-demand streaming, WAMS uses the an IAccessPolicy to specify the access permissions for a media resource. The IAccessPolicy is then associated with an ILocator for an asset to provide the path with which to access the media files.

The following sample shows how to generate the URL that can be used to download a media file:

The resulting URL can be used to download the media file in code or using a browser. No further authentication is needed since the query string of the URL contains a shared-access signature. A download URL looks like the following:

https://MyMediaAccount.blob.core.windows.net/asset-2c8b788a-bd0e-4fa5-b9cf-ae0b759cf416/SomeVideo_H264_4500kbps_AAC_und_ch2_128kbps.mp4?sv=2012-02-12&se=2013-10-08T04%3A09%3A48Z&sr=c&si=b57f5fb2-7f8c-447f-9b48-756adca6ae6f&sig=av7gJl4K%2FjAMesbONdbj62adRDRPpqWTfVekJJlhw0%3D

The following sample shows how to generate the URL for on-demand streaming:

This generates a URL for a on-demand streaming manifest. The following is an example manifest URL:

http://MyMediaAccount.origin.mediaservices.windows.net/69544296-9633-007b-a008-3b78b5d5ef8f/SomeVideo.ism/manifest

This manifest file can be used in a media player capable of supporting smooth streaming. A demonstration on-demand streaming player can be accessed here.

Summary

The Windows Azure Media Services team has done a great job in creating a PaaS media ingestion, processing and content-provision service. It is easy to setup and use, and provides both Portal and API support.

Posted in Media Services, Windows Azure | Tagged , | 4 Comments