SQL Azure Federations is a forthcoming technology that supports elastic scalability of SQL Azure databases. It provides Transact SQL support for managing the automated federation of data across multiple SQL Azure databases. It also supports the automatic routing of SQL statements to the appropriate SQL Azure database.
This post differs from other posts on the blog since I have not used SQL Azure Federations. Essentially, the post represents my notes of the many excellent posts on SQL Azure Federations that Cihan Biyikoglu has published on his blog. I also learned a lot from the PDC 10 presentation by Lev Novik on Building Scale-out Database Applications with SQL Azure. Cihan Biyikoglu also did a presentation at Tech Ed 11 on Building Scalable Database Solutions using Microsoft SQL Azure Database Federations.
UPDATE 5/12/2011. You can now nominate yourself for the upcoming SQL Azure Federations product evaluation program.
Scalability with State
Elastic scalability is an important driver behind cloud computing. Elasticity, the ability to scale services up and down, allows resource provision to be better matched to resource demand. This allows services to be provided less expensively.
Windows Azure supports elasticity through the provision of role instances. This elasticity is simplified by the requirement that an instance be stateless. Each new instance of a role is identical to every other instance of the role. In a Windows Azure service, state is at the role level and is kept in a durable store such as Windows Azure Table or SQL Azure.
There is, however, a need to support the elastic scalability of durable stores. A common technique is sharding – in which multiple copies of the data store are created, and data distributed to a specific copy or shard of the data store. A simple distribution algorithm is used to allocate all data for which some key is within a given range to the same shard.
Although sharding solves the elastic scalability problem from a theoretical perspective it introduces two practical problems: the routing of data requests to a specific shard; and shard maintenance. An application using a sharded datastore must discover to which shard a particular request is to be routed. The datastore must manage the partitioning of the data among the different shards.
The Windows Azure Table Service solves these problems through auto-sharding. It allocates entities to shards based on the value of the PartitionKey for an entity. The Windows Azure Table Service manages the distribution of these shards (or partitions) across different storage nodes in a manner that is totally transparent to applications using it. These applications route all requests to the same RESTful endpoint regardless of the shard containing the data.
SQL Azure currently provides no built-in support for sharding. Data can be sharded across multiple SQL Azure databases but shard maintenance is the responsibility of the SQL Azure DBA. Furthermore, an application must manage request routing since there is no support for it in SQL Azure. The lack of support for request routing causes problems with connection management since the routing of requests to many different databases fragments the connection pool thereby diminishing its utility.
However, Microsoft is developing SQL Azure Federations to provide native sharding support to SQL Azure. SQL Azure Federations provides simple Transact SQL extensions supporting shard management and provides a single connection endpoint for requests against a SQL Azure database. This feature will bring stateful elastic scalability to a relational environment just as the Windows Azure Table Service provides stateful elastic scalability in a NoSQL environment.
SQL Azure Federations introduces the concept of a federation which serves as the scalability unit for data in the database. A federation is identified by name, and specified by the datatype of the federation key used to allocate individual rows to different federation members (or shards) of the database. Each federation member is a distinct SQL Azure database. However, SQL Azure Federations exposes a single connection endpoint for the database and all federation members in it. Consequently, applications route all requests to the database regardless of which federation member data resides in. Note that a database can have more than one federation allowing it to support different scalability requirements for different types of data.
A federation is created using a CREATE FEDERATION statement parameterized by the federation name and the datatype of the federation key, as follows:
CREATE FEDERATION FederationName (federationKey RANGE BIGINT)
In SQL Azure Federations v1, the supported datatypes for federation key are: INT, BIGINT, UNIQUEIDENTIFIER, and VARBINARY(900). This statement creates a single federation member into which all federated data is inserted.
An application indicates that subsequent statements should be routed to a particular federation member through the USE FEDERATION statement, as follows:
USE FEDERATION FederationName (federationKey = 1729) WITH RESET, FILTERING = ON
(Your guess as to the meaning of WITH RESET is as good as mine.) Setting FILTERING to ON, as opposed to OFF, causes all SQL statements routed subsequently to the partition to be filtered implicitly by the specified value of the federation key. This is useful when migrating a single tenant database to a multi-tenant, federated database since it allows a federation member to mimic a single tenant database.
A federated database comprises various types of tables:
- federation tables – large-scale data that scales elastically in federation members.
- reference tables – small-scale lookup data copied to each federation member.
- central tables – resident in the root database that is not distributed to federation members.
Federation tables are those to be sharded across different federation members. A table is declared to be federated when it is created, as follows:
CREATE TABLE tableName
( … )
FEDERATED ON (federationKey = columnName)
Reference tables are distributed in toto to each federation member and are not sharded. They are created in a federation member by a normal CREATE TABLE statement that has no decoration for SQL Azure Federation. This means that the connection must currently be USING a federation member.
Central tables exist in the root database and are not distributed to federation members. They are created in the federation root by a normal CREATE TABLE statement that has no decoration for SQL Azure Federation. The USE FEDERATION statement can be used to route connections to the root database as follows:
USE FEDERATION ROOT
Each federation table in a federation must have a column matching the federation key. Every unique index on the table must also contain this federation key. All federation tables in a federation member contain data only for those records in the federation key range allocated to that federation member.
The records with a specific value of federation key in the different federation tables of a federation form an atomic unit since they form an indivisible set of records that are always allocated to the same federation member. In practice, these records are likely related since they likely represent, for example, all the data for a single customer. Furthermore, it may be worth adding the federation key to denormalize a table precisely so that its records belong to the same atomic unit as related data. An obvious example is adding a customer Id to an OrderDetail table so that the order details are federated along with the order and customer information in a federation where the federation key is customer Id.
Cihan Biyikoglu goes much deeper into the rules surrounding federation tables in a post describing Considerations When Building Database Schema with Federations in SQL Azure.
Elastic Scalability of a Federation
A single federation member adds little to scalability, so SQL Azure Federations provides the ALTER FEDERATION statement to support elastic scalability of a federation. This statement has a SPLIT keyword indicating that a federation member should be split into two new federation members with the data being divided at a specified value of federation key. Similarly a MERGE parameter can be used to merge all the data in two federation members into a single federation member. For example, the following statement splits a federation member:
ALTER FEDERATION federationName SPLIT AT (federationKey = 50000)
The ALTER FEDERATION SPLIT statement is central to the elastic scalability provided by SQL Azure Federations. As individual federation members fill up, this statement can be used repeatedly to distribute federation data among more and more federation members. This is not automated as it is with Windows Azure Table, but it is a relatively simple administrative process.
Cihan Biyikoglu has a post – Federation Repartitioning Operations: Database SPLIT in Action – in which he provides a detailed description of the split process. To maintain consistency and high-availability while a federation member is split, the data in it is fully migrated into two new federation members before they are made active federation members. The original federation member is deleted after the split federation members become active. Once the ALTER FEDERATION statement has been issued, the split process is fully automated by SQL Azure Federations.
SQL Azure Federations is not yet available and, as stated at the beginning, this post is based primarily on blog posts written over a period of months. There appear to have been slight syntax changes during that period so it is quite likely that the statements described above may not be completely correct.
In his MIX 11 presentation, David Robinson listed SQL Azure Federations as one of the “CY11 Investment Themes” for the SQL Azure team. Hopefully, this gives some idea of the timeframe when we will finally get our hands on at least a test version of SQL Azure Federations.