Handling Transient Connection Failures in SQL Azure

This post is one of the recipes in my book Microsoft Windows Azure Development Cookbook. The recipe describes how to use the Transient Fault Handling Framework to handle transient connection failures when using SQL Azure.

The Windows Azure Customer Advisory Team, which supports the framework, describes it as follows:

The Transient Fault Handling Framework solution provides a reusable framework for building extensible retry policies capable of handling different types of transient conditions in applications leveraging SQL Azure, Windows Azure storage (queues, blobs, tables), Windows Azure AppFabric Service Bus and Windows Azure AppFabric Caching Service.

Although the post is specifically concerned with SQL Azure, the general idea can be implemented when using the Windows Azure Storage service, the Windows Azure AppFabric Service Bus and the Windows Azure AppFabric Caching Service.

I highly recommend the Windows Azure Customer Advisory Team blog. It has many posts showing real-world best practices for using the various features of the Windows Azure Platform.

Handling connection failures to SQL Azure

SQL Azure database is a distributed system in which each physical server hosts many databases. This sharing of resources leads to capacity constraints on operational throughput. SQL Azure handles these capacity constraints by throttling operations and closing connections that are using too many resources. SQL Azure also closes connections when it alleviates operational hot spots by switching from a primary SQL Azure database to one of its two backup copies. Furthermore, connectivity to a SQL Azure database is likely to be less reliable than connectivity to a Microsoft SQL Server database on a corporate LAN. It is imperative therefore that applications using SQL Azure be designed to withstand the connection failures that are far more likely to occur than with Microsoft SQL Server.

One of the mantras of cloud development is design for failure. It is important that applications using SQL Azure be designed to handle failures appropriately. There are two kinds of error: permanent errors indicating a general failure of part of the system and transient errors existing only for a brief time. Permanent errors perhaps indicate a logical problem with the application—and handling them may require code changes. However, an application should handle transient errors gracefully by retrying the operation that led to the error in the hope that it does not recur. A dropped connection should be regarded as transient, and an application should respond to a dropped connection by opening a new connection and retrying the operation.

There remains the problem of distinguishing permanent from transient errors. This can be done by comparing the error returned from a failed operation with a known list of transient errors. An application can therefore include a retry mechanism that checks the status of operations and retries any operations that experienced a transient error.

The Windows Azure AppFabric Customer Advisory Team has made available on the MSDN Code Gallery the source code and pre-compiled assemblies for the Transient Fault Handling Framework for Azure Storage, Service Bus, and SQL Azure. This comprises a set of classes that can be used to detect transient failures and retry SQL operations. It contains an extensible way to identify transient failures, with various examples including one that compares an error with a list of known transient failures. The Transient Fault Handling Framework provides various built-in retry backoff techniques that specify how often and frequently an operation should be retried following a transient failure. These include both a fixed interval and an exponential delay between retries. The classes in the Transient Fault Handling Framework include various extension methods that simplify the use of the framework, thereby minimizing the work required to add the handling of dropped connections and other transient failures to an application using SQL Azure.

In this recipe, we will learn how to use the Transient Fault Handling Framework for Azure Storage, Service Bus, and SQL Azure to handle dropped connections and other transient failures when using SQL Azure.

Getting ready

The recipe uses the Transient Fault Handling Framework for Azure Storage, Service Bus, and SQL Azure. It can be downloaded from the following URL:

http://code.msdn.microsoft.com/Transient-Fault-Handling-b209151f 

This download is a Visual Studio solution with precompiled output assemblies that are referenced in the project used in the recipe.

How to do it…

We are going to connect to SQL Azure using ADO.NET and perform various DDL and DML operations taking advantage of the transient-error handling provided by the Transient Fault Handling library. We do this as follows:

1. On the Project Properties dialog in Visual Studio, set the Target Framework to .NET Framework 4.

2. Add the following assembly references to the project:

Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.dll
System.configuration.dll

3. Add a new class named RetryConnectionExample to the project.

4. Add the following using statements to the top of the class file:

using System.Data;
using System.Data.SqlClient;
using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling;
using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.SqlAzure;
using Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.Configuration;

5. Add the following private members to the class:

String connectionString;
RetryPolicy connectionRetryPolicy;
RetryPolicy commandRetryPolicy;

6. Add the following constructor to the class:

public RetryConnectionExample(String server, String database,
   String login, String password)
{
   SqlConnectionStringBuilder connStringBuilder;
   connStringBuilder = new SqlConnectionStringBuilder();
   connStringBuilder.DataSource = String.Format(“{0}.database.windows.net”, server);
   connStringBuilder.InitialCatalog = database;
   connStringBuilder.Encrypt = true;
   connStringBuilder.TrustServerCertificate = false;
   connStringBuilder.UserID = String.Format(“{0}@{1}”, login, server);
   connStringBuilder.Password = password;
   connectionString = connStringBuilder.ToString();

   connectionRetryPolicy = new RetryPolicy<SqlAzureTransientErrorDetectionStrategy>(
      5, TimeSpan.FromMilliseconds(100));

   connectionRetryPolicy.RetryOccurred += RetryConnectionCallback;

   RetryPolicyConfigurationSettings retryPolicySettings =
      ApplicationConfiguration.Current.GetConfigurationSection<
         RetryPolicyConfigurationSettings>(RetryPolicyConfigurationSettings.SectionName);

   RetryPolicyInfo retryPolicyInfo = retryPolicySettings.Policies.Get(“FixedIntervalDefault”);
   commandRetryPolicy =
      retryPolicyInfo.CreatePolicy<SqlAzureTransientErrorDetectionStrategy>();
   commandRetryPolicy.RetryOccurred += RetryCallbackCommand;
}

7. Add the following callback methods to the class:

private void RetryConnectionCallback(
   Int32 currentRetryCount, Exception lastException, TimeSpan delay)
{
   Int32 retryCount = currentRetryCount;
}

private void RetryCallbackCommand(
  Int32 currentRetryCount, Exception lastException, TimeSpan delay)
{
   Int32 retryCount = currentRetryCount;
}

8. Add the following method, retrieving the session tracing Id, to the class:

public String GetSessionTracingId()
{
  String commandText = “SELECT CONVERT(NVARCHAR(36), CONTEXT_INFO())”;

  String sessionTracingId;

  using (ReliableSqlConnection connection = new ReliableSqlConnection(connectionString))
 
{
     connection.Open();
     using (SqlCommand sqlCommand = connection.CreateCommand())
    
{
       
sqlCommand.CommandText = commandText;
       
sessionTracingId = sqlCommand.ExecuteScalarWithRetry() as String;
     }
   }  

   return sessionTracingId;
}

9. Add the following method, creating the Writer table, to the class:

public void CreateTable()
{
  String commandText = @”CREATE TABLE Writer (
    
Id int PRIMARY KEY NOT NULL,
    
Name nvarchar(20) NOT NULL,
    
CountBooks int NULL)”;

  using (ReliableSqlConnection connection =
    new ReliableSqlConnection(connectionString, connectionRetryPolicy))
  {
    
connection.Open();

     using (SqlCommand sqlCommand = connection.CreateCommand())
    
{
      
sqlCommand.CommandText = commandText;
      
sqlCommand.ExecuteNonQueryWithRetry();
    
}
   }
}

10. Add the following method, dropping the Writer table, to the class:

public void DropTable()
{
   String commandText = “DROP TABLE Writer”;

   using (ReliableSqlConnection connection = new ReliableSqlConnection(connectionString))
  
{
     
connection.Open(connectionRetryPolicy);

      using (SqlCommand sqlCommand = connection.CreateCommand())
     
{
        
sqlCommand.CommandText = commandText;
        
sqlCommand.ExecuteNonQueryWithRetry(commandRetryPolicy);
      }
   }
}

11. Add the following method, querying the Writer table, to the class:

public void QueryTable()
{
   String commandText = “SELECT * FROM Writer”;

   using (ReliableSqlConnection connection =
      new ReliableSqlConnection(connectionString, connectionRetryPolicy, commandRetryPolicy))
   {
      connection.Open();

      using (SqlCommand sqlCommand = new SqlCommand(commandText, connection.Current))
     
{
         using (IDataReader reader = connection.ExecuteCommand<IDataReader>(sqlCommand))
        
{
            Int32 idColumn = reader.GetOrdinal(“Id”);
           
Int32 nameColumn = reader.GetOrdinal(“Name”);
           
Int32 countBooksColumn = reader.GetOrdinal(“CountBooks”);
           
while (reader.Read())
           
{
              
Int32 id = (Int32)reader[idColumn];
              
String name = reader[nameColumn] as String;
              
Int32? countBooks = reader[countBooksColumn] as Int32?;
           
}
         }
      }
   }
}

12. Add the following method, inserting rows in the Writer table, to the class:

public Int32 InsertRows()
{
   String commandText = @”INSERT INTO Writer (Id, Name, CountBooks)
     VALUES
        (1, N’Cervantes’, 2),
       
(2, N’Smollett’, null),
       
(3, ‘Beyle’, 4)”;

   Int32 rowsAffected;
  
using (SqlConnection connection = new SqlConnection(connectionString))
  
{
     
connection.OpenWithRetry();

      using (SqlCommand sqlCommand = new SqlCommand(commandText, connection))
     
{
         rowsAffected = sqlCommand.ExecuteNonQueryWithRetry();
     
}
   }

   return rowsAffected;
}

13. Add the following method, updating a row in the Writer table, to the class:

public Int32 UpdateRow()
{
   RetryPolicy exponentialRetryPolicy = RetryPolicy.DefaultExponential;

   String commandText = @”UPDATE Writer
     
SET Name=@Name
     
WHERE Id=3″;

   Int32 rowsAffected;
  
using (SqlConnection connection = new SqlConnection(connectionString))
  
{
     
connection.OpenWithRetry(exponentialRetryPolicy);

      using (SqlCommand sqlCommand = new SqlCommand(commandText, connection))
     
{
        
SqlParameter sqlParameter = new SqlParameter()
        
{
           
ParameterName = “@Name”,
           
Value = “Stendhal”,
           
SqlDbType = SqlDbType.NVarChar,
           
Size = 20
        
};

         sqlCommand.Parameters.Add(sqlParameter);

         rowsAffected = sqlCommand.ExecuteNonQueryWithRetry(exponentialRetryPolicy);
     
}
   }
   return rowsAffected;
}

14. Add the following method, using the methods added earlier, to the class:

public static void UseRetryConnectionExample()
{
   String server = “SERVER_NAME”;
  
String database = “DATABASE_NAME”;
  
String login = “LOGIN”;
  
String password = “PASSWORD”;

   RetryConnectionExample example = new RetryConnectionExample(
      server, database, login, password);
  
example.GetSessionTracingId();
   example.CreateTable();
   example.InsertRows();
   example.QueryTable();
   example.UpdateRow();
   example.QueryTable();
   example.DropTable();
}

15. Add the following Transient Fault Handling Framework configuration to the app.config file for the project:

<configSections>
   <section name=”RetryPolicyConfiguration”
type=”Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling.Configuration.RetryPolicyConfigurationSettings,
Microsoft.AppFabricCAT.Samples.Azure.TransientFaultHandling”/>
</configSections>

<RetryPolicyConfiguration defaultPolicy=”FixedIntervalDefault”>
   <add name=”FixedIntervalDefault” maxRetryCount=”10″ retryInterval=”100″/>
</RetryPolicyConfiguration>

How it works…

In Step 1, we modify the output target of the project to make it consistent with the requirements of the Transient Fault Handling Framework. In Step 2, we add references to the Transient Fault Handling Framework assembly and to the System.configuration assembly used to access the Transient Fault Handling configuration in the app.config file.

In Steps 3 and 4, we set up the class. In Step 5, we add private members for the connection string and two RetryPolicy instances. In the constructor, we add in Step 6, we initialize the connection string using a SqlConnectionStringBuilder instance. Configuring a connection string for SQL Azure is precisely the same as for Microsoft SQL Server apart from the way in which the DataSource is specified—with the fully qualified host name. We turn encryption on, as this is required, and set TrustServerCertificate to false, so that the server certificate is validated. Instead of building the connection string like this, we could have loaded it from a configuration file.

For demonstration purposes, we initialize the RetryPolicy private members using different techniques. We create the connectionRetryPolicy member directly by providing initialization values in its constructor. We associate the RetryOccurred callback method with the connectionRetryPolicy member. We create the commandRetryPolicy member by retrieving a FixedIntervalDefault policy from the app.config file. We associate the RetryOccurred callback method with the commandRetryPolicy member. In both cases, we use SqlAzureTransientErrorDetectionStrategy to identify transient errors. This compares an error with a list of pre-defined transient errors.

In Step 7, we add two RetryOccurred callback methods the class. These have a trivial implementation that in a real application could be replaced by logging that a retry had occurred.

In Step 8, we create and open a ReliableSqlConnection which we use to create a SqlCommand. The connection is closed automatically when we exit the using block. We use SqlCommand to retrieve the session tracing ID for the connection. This is a GUID, identifying a particular connection, which can be provided to SQL Azure Support when its help is sought in debugging a problem. We use the default RetryPolicy when we open the connection and when we invoke the ExecuteScalarWithRetry() extension method. Note that the default RetryPolicy identifies all errors as being transient.

In Step 9, we invoke a CREATE TABLE operation on SQL Azure to create a table named Writer. The table has three columns: the Primary Key is the Id column; the remaining columns store the name of a writer and the number of books they wrote. We use the connectionRetryPolicy, configured in the constructor, when the connection is opened and the default RetryPolicy when we invoke the ExecuteNonQueryWithRetry() extension method.

In Step 10, we invoke a DROP TABLE operation on SQL Azure to drop the Writer table. We use the default RetryPolicy when the connection is opened and the commandRetryPolicy when we invoke the ExecuteNonQueryWithRetry() extension method.

In Step 11, we retrieve all rows from the Writer table and then iterate over them to examine the content of each column. We use the connectionRetryPolicy when the connection is opened and the commandRetryPolicy when we invoke the ExecuteComman<IDataReader>() extension method.

We insert three rows in the Writer table in Step 12. We invoke the OpenWithRetry() and ExecuteNonQueryWithRetry() extension methods to use the default RetryPolicy when we open and use the connection respectively. In Step 13, we use the same extension methods when we update a row in the Writer table. In this case, however, we parameterize them, so that we use the DefaultExponential retry policy when we open and use the connection. This default policy identifies all errors as transient.

In Step 14, we add a method that invokes the methods added earlier. We need to provide the server name, the database name, the login, and the password.

In Step 15, we add the configuration used to configure a RetryPolicy instance in Step 6. In doing so, we need to add a configSection element specifying the assembly used to access the configuration and then we add a RetryPolicyConfiguration element in which we specify a configuration we name FixedIntervalDefault.

Note that, with an appropriately configured connection string, all the code in this recipe can be run against Microsoft SQL Server—with the exception of the retrieval of the session tracing ID in Step 8.

There’s more…

The Transient Fault Handling Framework for Azure Storage, Service Bus, and SQL Azure can also be used for retrying operations against the Windows Azure Storage Service and the Windows Azure Service Bus.

See also

Valery Mizonov (@TheCATerminator) of the Windows Azure AppFabric Customer Advisory Team has written a blog post on Best practices for handling transient conditions in SQL Azure client applications. He explains how to use the Transient Fault Handling Framework. The post is available at the following URL:

http://windowsazurecat.com/2010/10/best-practices-for-handling-transient-conditions-in-sql-azure-client-applications/

About Neil Mackenzie

Cloud Solutions Architect. Microsoft
This entry was posted in Azure AppFabric, Service Bus, SQL Azure, Storage Service, Windows Azure and tagged , , . Bookmark the permalink.

2 Responses to Handling Transient Connection Failures in SQL Azure

  1. We are planning to use Entity Framework for accessing SQL Azure. Can this be used with EF or NHibernate as well?

  2. Valery Mizonov suggests in comments to this post that the CAT is working with the EF team to get a consistent story for handling transient connection failures for EF:
    http://bit.ly/rd00zA

    He recommends this post for transient connection handling with EF:
    http://bit.ly/r4jKPw

    Valery seems to respond to comments posted to the following post:
    http://bit.ly/qktkgZ

    I suggest you follow up there with additional questions.

Leave a reply to Rohit Gupta (@rbg) Cancel reply