Queries in the Live Framework API

The Live Framework API uses the LINQ query model to support querying the data stored in the local and cloud operating environment. This post will focus on data stored in the cloud and ways to query it using the API. As with the rest of the Live Framework, the API actually retrieves data by making REST calls against the cloud as documented here. In particular, queries are implemented as GETs with appropriately constructed URLs.

There are limits to how much data can be returned from the cloud in a single GET request. Furthermore, the cloud does not provide backing store for paging large queries so the developer is responsible for handling any paging required because the requested data is too large.

LINQ

LINQ provides different infrastructures for queries against local objects and remote data. Local queries use IEnumerable<T> to query in-memory collections. Remote queries use IQueryable<T> which is optimized to make efficient queries against remote collections of data. IEnumerable can be used against remote data but is not optimized in that it will retrieve all the data and then filter it. IQueryable, on the other hand, filters the data remotely before retrieving it. Performing queries remotely using IQueryable has the double benefit of transferring the query processing effort to the remote server and limiting the quantity of data received form the server as a consequence of the query.

Local and Remote Queries in the Live Framework

The Live Framework query documentation describes three ways to write basic queries:

var query1 = (from mo in mesh.CreateQuery<MeshObject>() select mo);

var query2 = (from mo in mesh.CreateQuery<MeshObject>().Execute() select mo);

var query3 = (from mo in mesh.CreateQuery<MeshObject>() select mo).ToList();

There are very significant differences in the behavior of query1 and query2 indicating query2 should never be used. Query3 is really just a variant of query1 with the output of the query exposed as a list and so will be ignored in the rest of this discussion.

query1 is a remote query using IQueryable while query2 is a local query using IEnumerable. The LINQ query of query1 is deferred and is sent across the wire to the cloud only when query1 is actually enumerated. The LINQ query of query2 is sent across the wire immediately and the results returned regardless of whether or not they are ever actually enumerated. Both query1, when enumerated, and query2 will invoke the following HTTP call:

GET /V0.1/Mesh/MeshObjects

For the simple query used in this example there does not seem to be much advantage, other than deferred execution, of query1 over query2.

The following classes all provide support for using CreateQuery to query data stored in the live operating environment:

  • Mesh
  • MeshObject
  • DataFeed

Note that a CreateQuery query, regardless of whether it is a remote or local query, will always access the network to retrieve data even if the data is already present in the calling object.

Filter Queries

The disadvantage of the local query becomes apparent when we add filters to the queries as follows:

var query1 = (from mo in mesh.CreateQuery<MeshObject>()
                     where mo.Resource.Type == "Singer"
                     select mo);

var query2 = (from mo in mesh.CreateQuery<MeshObject>().Execute()
                     where mo.Resource.Type == "Singer"
                     select mo);

query1 is not transmitted across the wire to the cloud until it is actually enumerated while query2 is transmitted immediately. More importantly, query1 transmits the filter along with the request and the server filters the data returned in response to the query. query2 on the other hand does not transmit the filter and receives all the data back prior to filtering it locally. If, for example, there were a 1,000 mesh objects only one of which was of type "Singer," query1 would receive precisely one mesh object from the cloud while query2 would receive 1,000. This is apparent immediately when the HTTP calls are compared.

The HTTP for query1 is:

GET /V0.1/Mesh/MeshObjects?$filter=(Type)%20eq%20(‘Singer’)

The HTTP for query2 is:

GET /V0.1/Mesh/MeshObjects

The filter is never transmitted over the network for local queries like query2. The same holds true for any of the String filters which may be applied:

  • Contains
  • EndsWith
  • StartsWith
  • Substring
  • ToLower
  • ToUpper

These are all transmitted over the network for remote queries like query1 and performed locally for local queries like query2.

Data Paging

The same behavior is exhibited by the functionality supporting paging of query results.

Consider the following queries:

var query1= (from mo in mesh.CreateQuery<MeshObject>()
                      where mo.Resource.Type == "Singer"
                      && mo.Resource.Title.Contains("Mor")
                      select mo).OrderBy(mo => mo.Resource.Title).Skip(1).Take(2);

var query2= (from mo in mesh.CreateQuery<MeshObject>().Execute()
                      where mo.Resource.Type == "Singer"
                      && mo.Resource.Title.Contains("Mor")
                      select mo).OrderBy(mo => mo.Resource.Title).Skip(1).Take(2);

These queries filter in only those mesh objects with resource type of "Singer" with a resource title containing the string "Mor" and then orders them alphabetically by resource title, skips the first mesh object and returns the next two.

One again, the filter and paging is performed on the cloud in query1 and the filtered and paged data returned. With query2, all the data is returned to the client for filtering and paging.

The HTTP for query1 is:

GET /V0.1/Mesh/MeshObjects?$filter=((Type)%20eq%20(‘Singer’))%20and%20(contains(Title,’Mor’))&$orderby=Title&$skip=1&$top=2

The HTTP for query2 is:

GET /V0.1/Mesh/MeshObjects

It should be apparent by now that the local query – specified by the use of Execute() in the LINQ text – always makes the same HTTP request. In doing so it demonstrates that it is always better to use a remote query as in the query1 of these examples.

Aside on Data Paging

The API documentation provides several functions to control paging including:

  • OrderBy
  • Skip
  • Take

The IQueryable implementation of the remote query converts these into the appropriate URL prior to transmission across the network. The API documentation actually mentions a Top function but that does not, in fact, exist. The other three functions exist on the IQueryable and IEnumerable interfaces but Top does not. However, the remote query did convert the Skip(1).Take(2) into $skip=1&$top=2 in the HTTP request – since there is a $top parameter in the REST interface.

Queries on Entries

The Live Framework supports another method for performing queries that does not use CreateQuery. This is another local query that has not been optimized for use across the wire to the cloud. An example is:

var query4= (from df in meshObject.DataFeeds.Entries
                      where df.Resource.Title == "Songs"
                      select df);

The HTTP for query4 is:

GET /V0.1/Mesh/MeshObjects/{OBJECT_IDENTIFIER}/DataFeeds

Note that this form of query is optimized to query the data locally if it is already contained in the object. For example, there will be no network traffic when querying for a specific mesh object if this form of query is performed on a Mesh object into which all the mesh objects have already been loaded.

Technorati Tags: ,

About Neil Mackenzie

Cloud Solutions Architect. Microsoft
This entry was posted in Uncategorized. Bookmark the permalink.

3 Responses to Queries in the Live Framework API

  1. Vikas says:

    Thanks Neil for your contributions to Live Framework Community. as always this has been tagged and indexed so anyone reading this blog entry should be able to find other Live Framework resources and blogs:http://social.msdn.microsoft.com/Forums/en-US/liveframework/thread/828d9a48-239a-4af8-8239-35931e514d37#page:3http://delicious.com/LiveFramework

  2. Jamie says:

    more great stuff Neil!

  3. Ben says:

    Great post Neil! One key thing to note though is that you should not mix CreateQuery queries with Entities queryies. Its not just that they are optimized for local/non local but they are querying different (in process cache vs LOE) copies of the data which can get out of sync for short periods of time.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s