Queries in the Windows Azure Storage Client Library v2.1

Windows Azure Storage has been a core part of the Windows Azure Platform since the public preview in 2008. It supports three storage features: Blobs, Queues and Tables. The Blob Service provides high-scale file storage – with prominent uses being: the storage of media files for web sites; and the backing store for the VHDs used as the disks attached to Windows Azure VMs. The Queue Service provides a basic and easy-to-use messaging system that simplifies the disconnected communication between VMs in a Windows Azure cloud service. The Table Service is a fully-managed, cost-effective, high-scale, key-value NoSQL datastore.

The definitive way to access Windows Azure Storage is through the Windows Azure Storage REST API. This documentation is the definitive source of what can be done with Windows Azure Tables. All client libraries, regardless of language, use the REST API under the hood. The Storage team has provided a succession of .NET libraries that sit on top of the REST API. The original Storage Client library had a strong dependence on WCF Data Services, which affected some of the design decisions. In 2012, the Storage team released a completely rewritten v2.0 library which removed the dependence and was more performant.

The Storage Client v2.0 library provided a fluent library for query invocation against Wizard Azure Tables. v2.1 of the library added a LINQ interface for query invocation. The LINQ interface is significantly easier to use than the fluent library while supporting equivalent functionality. Consequently, only the LINQ library is considered in this post.

The MSDN documentation for the Windows Azure  Storage Library is here. The Windows Azure Storage team has provided several posts documenting the Table Service API in the .NET Storage v2.x library (2.02.1). Gaurav Mantri has also posted on the  Table Service API as part of an excellent series of posts on the Storage v2.0 library. I did a post in 2010 that described the query experience for Windows Azure Tables in the Storage Client v1.x library.

Overview of Windows Azure Tables

Windows Azure Tables is a key-value, NoSQL datastore in which entities are stored in tables.  It is a schema-less datastore so that each entity in a table can have a different schema for the properties contained in it. The primary key, and only index, for a table is a combination of the PartitionKey and RowKey that must exist in each row. The PartitionKey specifies the partition (or shard) for an entity while the RowKey provides uniqueness within a partition. Different partitions may be stored on different physical nodes, with the Table Service managing this allocation.

The REST API provides limited query capability. It supports filtering as well as the specification of the properties to be returned. A query can be filtered on combinations of any of the properties in the entity. The right side of each filter must be against a constant, and it is not possible to compare values of different properties. It is also possible to specify a limit on the number of entities to be returned by a query. The general rules for queries are documented here with specific rules for filters provided here.

The Table Service uses server-side paging to limit the amount of data that may be returned in a single query. Server-side paging is indicated by the presence of a continuation token in the response to the query. This continuation token can be provided in a subsequent query to indicate where the next “page” of data should start. A single query has a hard limit of 1,000 entities in a single page and can execute for no more than 5 seconds. Furthermore, a page return is also caused by all the entities hosted on a single physical node having been returned. An odd consequence of this is that it is possible to get back zero entities along with a continuation token. This is caused by the Table Service querying a physical node where no entities are currently stored. The only query that is guaranteed never to return a continuation token is one that filters on both PartitionKey and RowKey. Note that any query which does not filter on either PartitionKey or RowKey will result in a table scan.

The Table Service returns queried data as an Atom feed. This is a heavyweight XML protocol that inflates the size of a query response. The Storage team announced at Build 2013 that it would support the use of JSON in the query response which should reduce the size of a query response. To reduce the amount of data returned by a query, the Table Service supports the ability to shape the returned data through the specification of which properties should be returned for an entity.

The various client libraries provide a native interface to the Table Service that hides much of the complexity of filtering and continuation tokens. For example, the .NET library provides both fluent and LINQ APIs allowing a familiar interaction with Windows Azure Tables.

ITableEntity

The ITableEntity class provides the interface implemented by all classes used to represent entities in the Storage Client library v2.x. ITableEntity defines the following properties:

  • ETag- entity tag used for optimistic concurrency
  • PartitionKey – partition key for the entity
  • RowKey – row key for the entity
  • Timestamp – timestamp for last update

The library contains two classes implementing the ITableEntity interface:

TableEntity provides the base class for user-defined classes used to represent entities in the Storage Client library. These derived classes expose properties representing the properties of the entity. DynamicTableEntity is a sealed class which stores the entity properties inside an IDictionary<String,EntityProperty> property named Properties.

The use of strongly-typed classes derived from TableEntity is useful when entities are all of the same type. However, DynamicTableEntity is helpful when handling tables which take full advantage of the schema-less nature of Windows Azure Tables and have entities with different schemas in a single table.

Basic LINQ Query

LINQ is a popular method for specifying queries since it provides a natural syntax that makes explicit the nature of the query and the shape of returned entities. The Storage Client library supports LINQ and uses it to expose various query features of the underlying REST interface to .NET.

A LINQ query is created using the CreateQuery<TEntity>() method of the CloudTable class. TEntity is a class derived from ITableEntity. In the query the where keyword specifies the filters and the select keyword specifies the properties to be returned.

With BookEntity being an entity class derived from TableEntity, the following is a simple example using the Storage Client library of a LINQ query against a table named book:

The definition of the query creates an IQueryable<BookEntity> but does not itself invoke an operation against the Table service. The operation actually occurs when the foreach statement is invoked. This example queries the book table and returns all entities where the PartitionKey property takes the value hardback.  A table is indexed on PartitionKey and RowKey and data is returned from a query ordered by PartitionKey/RowKey. In this example the data is ordered by RowKey since the PartitionKey is fixed in the query.

The Storage Client library handles server-side paging automatically when the query is invoked. Consequently, if there are many entities satisfying the query a significant amount of data is returned.

Basic queries can be performed using IQueryable, as above. More sophisticated queries – with client-side paging and asynchronous invocation, for example – are handled by converting the IQueryable into a TableQuery.

Server Side Paging

The Storage library supports server-side paging using the Take() method. This allows the specification of the maximum number of entities that should be returned by query invocation. This limit is performed server-side so can significantly limit the amount of data returned and consequently the time taken to return the data.

For example, the above query can be modified to return a maximum of 10 entities:

Note that this simple query can not by itself be used to page through the data from the client side. Multiple invocations always return the same entities. Paging through the data requires the handling of the continuation tokens returned by the server to indicate that there is additional data satisfying the query.

To handle continuation tokens, the IQueryable must be cast into a TableQuery<TElement>. This can be done either through direct cast or using the AsTableQuery() extension method. TableQuery<TElement> exposes an ExecuteSegmented() method which handles continuation tokens:

This method invokes a query in which the result set starts with the entity indicated by an(opaque) continuation token, which should be null for the initial invocation. It also takes optional TableRequestOption (timeouts and retry policies) and OperationContext (log level) parameters. TableQuerySegment<TElement> is an IEnumerable that exposes two properties: ContinuationToken and Results. The former is the continuation token, if any, returned by query invocation while the latter is a List of the returned entities. A null value for the returned ContinuationToken indicates that all the data has been provided and that no more query invocations are needed.

The following example demonstrates the use of continuation tokens:

The TableQuery class also exposes various asynchronous methods. These include traditional APM methods of the form BeginExecuteSegmented()/EndExecuteSegmented() and modern Task-based (async/await) methods of the form ExecuteSegmentedAsync().

Extension Methods

The TableQueryableExtensions class in the Microsoft.WindowsAzure.Storage.Table.Queryable namespace provides various IQueryable<TElement> extension methods:

AsTableQuery() casts a query to a TableQuery<TElement>. Resolve() supports client-side shaping of the entities returned by a query. WithContext() and WithOptions() allow an operation context and request options respectively to be associated with a query.

Entity Shaping

The Resolve() extension method associates an EntityResolver delegate that is invoked when the results are serialized. The resolver can shape the output of the serialization into some desired form. A simple example of this is to perform client-side modification such as creating a fullName property out of firstName and lastName properties.

The EntityResolver delegate is defined as follows:

The following example shows a query in which a resolver is used to format the returned data into a String composed of various properties in the returned entity:

More sophisticated resolvers can be defined separately. For example, the following example shows the returned entities being shaped into instances of a Writer class:

Schema-Less Queries

The DynamicTableEntity class is used to invoke queries in a schema-free manner, since the retrieved properties are stored in a Dictionary. For example, the following example performs a filter using the DynamicTableEntity Properties collection and then puts the returned entities into a List of DynamicTableEntity objects:

The individual properties of each entity are accessed through the Properties collection of the DynamicTableEntity.

Summary

The Windows Azure Storage Client v2.1 library supports the use of LINQ to query a table stored in the Windows Azure Table Service. This library exposes, in a performant .NET library, all the functionality provided by the underlying Windows Azure Storage REST API.

About Neil Mackenzie

Azure Architect at Satory Global.
This entry was posted in Storage Service, Windows Azure and tagged , . Bookmark the permalink.

8 Responses to Queries in the Windows Azure Storage Client Library v2.1

  1. Pingback: Dew Drop – November 4, 2013 (#1,659) | Morning Dew

  2. Pingback: MSDN Blogs

  3. th0maswe1ss says:

    Typo in section “Basic LINQ Query”: CreateTable() should be CreateQuery()

  4. @Thomas

    Thanks. Fixed it.

  5. abergs says:

    If I use LINQ querying, I shouldn’t have to worry about continuation tokens, right?

  6. The LINQ query handles continuation tokens unless you use one of the Segmented variants where you have control, as shown in the text. This control can be useful if the query returns a lot of data.

  7. If you have a query that returns thousands of rows you probably want to handle the continuation tokens to avoid waiting for the automated return of all the rows.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s