Queries in Azure Tables

UPDATE: This post is specific to the Windows Azure Storage Client v1.x library. A follow-up post on the Windows Azure Storage Client v2.1 library is here.

This is a follow-up to a post on Azure Tables, providing additional information on queries against the Azure Table Service.

CreateQuery<T>()

There are several classes involved in querying Azure Tables using the Azure Storage Client library. However, there is a single method central to the querying process and that is CreateQuery<T>() in the DataServiceContext class. CreateQuery<T>() is declared:

public DataServiceQuery<T> CreateQuery<T>(String entitySetName);

This method is used implicitly or explicitly in every query against Azure Tables using the Storage Client library. The CreateQuery<T>() return type is DataServiceQuery which implements both the IQueryable<T> and IEnumerable<T> interfaces. IQueryable<T> provides functionality to query data sources while IEnumerable<T> provides functionality to enumerate the results of these queries.

LINQ supports the decoration of a query by operators filtering the results of the query. Although a full LINQ implementation has many decoration operators only the following are implemented for the Storage Client library:

These are implemented as extension methods on the DataServiceQuery<T> class. When a query is executed these decoration operators are translated into the $filter and $top operators used in the Azure Storage Services REST API query string submitted to the Azure Storage Service. The remaining LINQ query decoration operators are not implemented because the Azure Storage Services REST API does not provide an implementation for them.

The following example demonstrates a trivial use of CreateQuery<T>() and the Take() operator to retrieve ten records from the Songs table in Development Storage:

protected void SimpleQuery(CloudTableClient cloudTableClient)
{
    TableServiceContext tableServiceContext = cloudTableClient.GetDataServiceContext();
    tableServiceContext.ResolveType = ResolveEntityType;
    IQueryable<Song> songs = (from entity in tableServiceContext.CreateQuery<Song>(“Songs”) select entity).Take(10);
    List<Song> songsList = songs.ToList<Song>();
}

As with other LINQ implementations the query is not submitted to the Storage Service until the query results are enumerated by ToList<Song>().

A close reading of the example shows that the query retrieves 10 entities of type Song from a table named Songs. The performance thread on the MSDN Azure forum indicates there is a small performance degradation when the name of the CLR class for the entity, and the name of the table in Windows Azure Table are different as is the case here. The solution is to use the DataServiceContext.ResolveType delegate property to override the default type resolution used by the library. In the sample code above ResolveType is set to a delegate, ResolveEntityType, defined as follows:

public Type ResolveEntityType(String name)
{
    return typeof(Song);
}

A simple delegate such as this can profitably be replaced with a lambda expression as follows:

tableServiceContext.ResolveType = (unused) => typeof(Song);

The lambda expression inlines the ResolveEntityType delegate through more elegant if less obvious code.

The MSDN Azure documentation has a page showing various example of LINQ queries demonstrating filtering on properties with the various datatypes – String, numbers, Boolean and DateTime – so they will not be repeated here. Instead, this post is focused on the various methods provided to invoke queries.

Contexts

The SimpleQuery example used the TableServiceContext.CreateQuery() method as follows:

tableServiceContext.CreateQuery<Song>(“Songs”)

It is possible to simplify this syntax by deriving a class from TableServiceContext similar to the following:

public class SongContext : TableServiceContext
{
    internal static String TableName = “Songs”;

    public SongContext(String baseAddress, StorageCredentials credentials)
        : base(baseAddress, credentials)
    { }

    public IQueryable<Song> Songs
    {
        get
        {
            return this.CreateQuery<Song>(TableName);
        }
    }

    public void AddSong(Song song)
    {
        this.AddObject(TableName, song);
        this.SaveChanges();
    }
}

This class is specific to the Song model class representing the entities in the Azure table named Songs. The Songs property can be used as the core of any LINQ query instead of the tableServiceContext.CreateQuery<Song>(“Songs”) used previously. Doing this simplifies and improves the readability of the LINQ query. For example, the LINQ query:

from entity in tableServiceContext.CreateQuery<Song>(“Songs”).Take(10) select entity

can be rewritten as:

from entity in songContext.Songs.Take(10) select entity

where songContext is a SongContext object.

Querying on PartitionKey and RowKey

The primary key for an entity in an Azure table comprises PartitionKey and RowKey. The use of the REST API as the definitive way to access Azure leads to an anomalous situation in queries using PartitionKey and RowKey. A LINQ query filtering on various properties is converted into a RESTful request to the table with a filter query string specifying the filter the Azure Table Service should apply to the query against the table. If no records match the filter the query returns an empty result set. However, a LINQ query filtering on only the PartitionKey and RowKey is converted into a RESTful request to the entity identified by the PartitionKey and RowKey. If there is no such entity the Azure Table Service responds with a 404 Not Found which is converted into a DataServiceQueryException. This anomalous behavior applies only to queries specifying both PartitionKey and RowKey and no other properties. Queries specifying only one of PartitionKey or RowKey are treated as normal queries against the table and not a specific entity.

The most performant query in the Azure Table Service is one specifying both PartitionKey and RowKey. A query specifying PartitionKey and not RowKey must scan every entity in the partition while a query specifying RowKey and not PartitionKey must query each partition separately.

Continuation

A query specifying PartitionKey and RowKey is the only query guaranteed to return its entire result set in a single response. A further limit on query results is that no more than 1000 results are ever returned in response to a single request – regardless of how many entities satisfy the query filter. The Azure Table Service inserts a continuation token in the response header to indicate that there are additional results which can be retrieved through an additional request parameterized by the continuation token. The Storage Client library mostly hides the use of continuation tokens from the developer.

DataServiceQuery

DataServiceQuery is the ADO.Net Data Services class representing a query to the Azure Table Service. DataServiceQuery provides the following methods to send queries to the Azure Table Service.

public IAsyncResult BeginExecute(AsyncCallback callback, Object state);
public IEnumerable<TElement> EndExecute(IAsyncResult asyncResult);
public IEnumerable<TElement> Execute();

Execute() is a synchronous method which sends the query to the Azure Table Service and blocks until the query returns.  BeginExecute() and EndExecute() are a matched pair of methods used to implement the AsyncCallback Delegate model for asynchronously accessing the Azure Storage Service.

The following is an example of Execute():

protected void UsingDataServiceQueryExecute(CloudTableClient cloudTableClient)
{
    TableServiceContext tableServiceContext = cloudTableClient.GetDataServiceContext();
    tableServiceContext.ResolveType = (unused) => typeof(Song);
    DataServiceQuery<Song> dataServiceQuery =
      (from entity in tableServiceContext.CreateQuery<Song>(“Songs”)
       select entity).Take(10) as DataServiceQuery<Song>;
    IEnumerable<Song> songs = dataServiceQuery.Execute();
    foreach (Song song in songs)
    {
        String singer= song.Singer;
    }
}

Note that the query must be explicitly cast from an IQueryable<Song> to a DataServiceQuery<Song>.

The asynchronous model is implemented by invoking BeginExecute() passing it the name of a static callback delegate and, optionally, an object providing some invocation context to the callback delegate. In practice, this object must include the DataServiceQuery object on which BeginExecute() was invoked. BeginExecute() initiates query submission and sets up an IO Completion Port to wait for the query to complete. When it does the callback delegate is invoked on a worker thread. EndExecute() must be invoked in the callback delegate to access the results. Furthermore, a failure to invoke EndExecute() could lead to resource leakage. EndExecute() returns an object of type QueryOperationResponse<T> which implements an IEnumerable<T> interface. QueryOperationResponse<T> exposes information about the query request and response including the HTTP status of the response.

The following example, using a model-specific context as described earlier, invokes BeginExecute() with a callback delegate named BeginExecuteIsDone and passes the dataServiceQuery as the invocation context:

protected void QuerySongsExecuteAsync()
{
    SongContext songsContext = new SongContext(
        AzureStorageConstants.TableEndPoint.ToString(),
        new StorageCredentialsAccountAndKey(AzureStorageConstants.Account, AzureStorageConstants.Key));
    songsContext.ResolveType = (unused) => typeof(Song);

    DataServiceQuery<Song> dataServiceQuery =
       (from entity in songsContext.Songs
        select entity).Take(10) as DataServiceQuery<Song>;

    IAsyncResult iAsyncResult = dataServiceQuery.BeginExecute(BeginExecuteIsDone, dataServiceQuery);
}

static void BeginExecuteIsDone(IAsyncResult result)
{
    DataServiceQuery<Song> dataServiceQuery = result.AsyncState as DataServiceQuery<Song>;
    IEnumerable<Song> songs = dataServiceQuery.EndExecute(result);
    foreach (Song song in songs)
    {
        String singer = song.Singer;
    }
}

AzureStorageConstants is a trivial utility class exposing the Azure Storage account and key. Note that exception handling is even more important in these callback delegates than it is in normal code because they are not invoked from user code and errors can not be caught outside the function. Consequently, all errors must be caught and handled inside the callback delegate.

The callback delegate can be replaced with a lambda expression as was done with ResolveType. The above two methods can be replaced with the following method:

protected void QuerySongsExecuteAsync()
{
    SongContext songsContext = new SongContext(
        AzureStorageConstants.TableEndPoint.ToString(),
        new StorageCredentialsAccountAndKey(AzureStorageConstants.Account, AzureStorageConstants.Key));
    songsContext.ResolveType = s => typeof(Song);

    DataServiceQuery<Song> dataServiceQuery = (from entity in songsContext.Songs select entity).Take(10) as DataServiceQuery<Song>;

    IAsyncResult iAsyncResult = dataServiceQuery.BeginExecute((result) =>
    {
        QueryOperationResponse<Song> queryOperationResponse =
            dataServiceQuery.EndExecute(result) as QueryOperationResponse<Song>;
        List<Song> listSongs = queryOperationResponse.ToList<Song>();
    }
    , null);
}

In the parameter list to BeginExecute() the name of the callback delegate has been replaced by an inlined lambda expression. One of the benefits of using a lambda expression is that the outer variables, such as dataServiceQuery, are visible inside the lambda expression and do not need to be passed in – hence the null value for the second parameter to BeginExecute(). Note that the return value from invoking EndExecute() has been cast to a QueryOperationResponse<T> merely for demonstration.

UPDATE 2/10/2009

Note that version of ADO.Net Data Services currently used in Azure does not support server-side paging so that a DataServiceQuery is not able to process the continuation tokens required to retrieve more than 1,000 entities. The next version does but is not yet released in the Azure environment. Consequently, you cannot rely on DataServiceQuery.Execute() to retrieve all the entities requested if there are more than 1,000 of them – or, indeed, if there is a need for continuation tokens which can happen on any query not including PartitionKey and RowKey.

CloudTableQuery

The DataServiceQuery class does not provide any methods to support continuation of queries when the Azure Table Service returns continuation tokens indicating that there are additional query results awaiting retrieval. The CloudTableQuery<T> class provides that support. A CloudTableQuery<T> object is created using one of the two constructors:

public CloudTableQuery<TElement>(DataServiceQuery<TElement> query, RetryPolicy policy);
public CloudTableQuery<TElement>(DataServiceQuery<TElement> query);

or the (arguably misnamed) AsTableServiceQuery() extension method of the TableServiceExtensionMethods class:

public static CloudTableQuery<TElement> AsTableServiceQuery<TElement> ( IQueryable<TElement> query )

The CloudTableQuery<T> class has the following synchronous methods to handle query submission to the Azure Table Service:

public IEnumerable<TElement> Execute(ResultContinuation continuationToken);
public IEnumerable<TElement> Execute();

Execute() handles continuation automatically and will continue to submit queries to the Azure Table Service until all the results have been returned. Execute(ResultContinuation) starts the request with a previously acquired ResultContinuation object encapsulating a continuation token and continues the query until all results have been retrieved. Note that care should be taken when using either form of Execute() since large amounts of data might be returned when the query is enumerated.

The following example shows Execute() being used to retrieve all the records from a table through the automatic handling of continuation tokens:

protected void UsingCloudTableQueryExecute(CloudTableClient cloudTableClient)
{
    TableServiceContext tableServiceContext = cloudTableClient.GetDataServiceContext();
    tableServiceContext.ResolveType = (unused) => typeof(Song);
    CloudTableQuery<Song> cloudTableQuery =
        (from entity in tableServiceContext.CreateQuery<Song>(“Songs”)
         select entity).AsTableServiceQuery<Song>();
    IEnumerable<Song> songs = cloudTableQuery.Execute();
    foreach (Song song in songs)
    {
        String singer = song.Singer;
    }
}

The CloudTableQuery<T> class has an equivalent set of asynchronous methods declared:

public IAsyncResult BeginExecuteSegmented(ResultContinuation continuationToken, AsyncCallback callback, Object state);
public IAsyncResult BeginExecuteSegmented(AsyncCallback callback, Object state);
public ResultSegment<TElement> EndExecuteSegmented(IAsyncResult asyncResult);

These follow the method-naming style used elsewhere in the Storage Client library whereby the suffix Segmented indicates that the methods bring data back in batches – in this case from one continuation token to the next. This provides a convenient method of paging through results in batches of size specified by the Take() query decoration operator or the 1000 records that is the maximum number of records retrievable in a single request. As with the synchronous Execute() methods the difference between the two BeginExecuteSegmented() methods is that one starts the retrieval at the beginning of the query result set while the other starts at the entity indicated by the continuation token in the ResultContinuation parameter.

The following is an example of BeginExecuteSegmented() and EndExecuteSegmented() to page through the result set of a query in pages of 10 entities at a time:

protected void QuerySongsExecuteSegmentedAsync(CloudTableClient cloudTableClient)
{
    TableServiceContext tableServiceContext = cloudTableClient.GetDataServiceContext();
    tableServiceContext.ResolveType = (unused) => typeof(Song);

    CloudTableQuery<Song> cloudTableQuery =
       (from entity in tableServiceContext.CreateQuery<Song>(“Songs”).Take(10)
        select entity ).AsTableServiceQuery<Song>();
    IAsyncResult iAsyncResult =
        cloudTableQuery.BeginExecuteSegmented(BeginExecuteSegmentedIsDone, cloudTableQuery);
}

static void BeginExecuteSegmentedIsDone(IAsyncResult result)
{
    CloudTableQuery<Song> cloudTableQuery = result.AsyncState as CloudTableQuery<Song>;
    ResultSegment<Song> resultSegment = cloudTableQuery.EndExecuteSegmented(result);

    List<Song> listSongs = resultSegment.Results.ToList<Song>();

    if (resultSegment.HasMoreResults)
    {
        IAsyncResult iAsyncResult =
           cloudTableQuery.BeginExecuteSegmented(
              resultSegment.ContinuationToken, BeginExecuteSegmentedIsDone, cloudTableQuery);
    }
}

It is also possible to iterate through subsequent results using the GetNext() method of the ResultSegment<T> class rather than using BeginExecuteSegmented() with a ResultContinuation parameter. Although I’m not sure why one would do so.

UPDATE 10/5/2010: Jim O’Neil has an excellent post on using Asynchronous Table Storage Pagination with IIS that uses GetNext() for synchronous retrieval of result sets following an initial asynchronous call./UPDATE

It is worth noting the difference made by replacing the cloudTableQuery in the above example with:

CloudTableQuery<Song> cloudTableQuery =
    (from entity in tableServiceContext.CreateQuery<Song>(“Songs”)
     select entity).Take(10).AsTableServiceQuery<Song>();

Here, the Take(10) is outside the LINQ query definition. This query results in the retrieval of only 10 records and does not page through the table in pages of 10 entities as in the example.

Parallel Queries

A significant benefit of using asynchronous queries is that several queries can be done at once. These could be against different tables or the same table. In the latter case it is possible that they could contend with each other and be throttled – although this is possibly less likely if the queries are against different partitions. This is discussed in an interesting thread on the MSDN Azure forum.

About Neil Mackenzie

Azure Architect at Satory Global.
This entry was posted in Storage Service, Windows Azure. Bookmark the permalink.

2 Responses to Queries in Azure Tables

  1. Ramesh says:

    Thanks for the detail post. I came through your site when I was searching for getting the row count of a table and used cloudTableQuery.Execute() to solve my issue.

  2. Pingback: Exploring Windows Azure Storage APIs By Building a Storage Explorer Application - Paolo Salvatori's Blog - Site Home - MSDN Blogs

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s