Tuesday, July 6, 2010

Table-Valued Parameters

This post is initiated by Ken Simmons (Blog | Twitter). He asked to write a blog post that describes our favorite feature of SQL Server. Ken's post here is not only interesting due to the MSDN package :-) but also caused me thinking about what I changed in my database solutions since SQL Server 2008.

For sure, this is not my first post about SQL Server 2008 feature User Defined Table-Valued types. I'd not call them as the greatest feature of SQL Server in general, however they are one of the greatest new features in version 2008.

Bulk Load by Ids

Often it is required to load many rows of one table by their ids or any other attribute. A sample is reloading of already known data.

One solution is to send one query for each item to be (re-)loaded. However, this causes many small queries, stressing server and network.

Another approach is to create one dynamic SQL query, containing an IN statement for all ids to be reloaded. Though, this has several issues. Dynamic creation of SQL statements is always a potential open door for SQL injection. Those IN statements make it impossible for SQL Server to reuse existing execution plans. IN statements with many items require many resources on server side and become slow. This works only if there is one unique column to filter.

The - in my opinion - best solution, before SQL Server 2008, is to create a concatenated string of all IDs to be loaded, send the string to the server, split them into a temp table and use this table joined to the real table to return the requested rows. I wrote a huge post about SQL split functions here. Though, as cool as this approach is, it stays a little hack. When working with more than one column, this approach becomes awkward since you have to handle hierarchical data (lines and items) within the string.

User-Defined Table-Valued types are a new and clean way to bulk load rows from database. Let's start with the definition of a new type.
CREATE TYPE IntIdTable AS TABLE (
   Id INT NOT NULL PRIMARY KEY CLUSTERED
)
Now, we are able to write a stored procedure which gets this type as parameter. (I don't show the referenced table "TestData100K" because it doesn't really matter. Out of others, it contains one column, called "Id". Please, no bashing for the "SELECT *", it's a sample.).
CREATE PROCEDURE GetTestDataByIds
   @ids IntIdTable READONLY
AS
   SELECT td.*
   FROM TestData100K td
      JOIN @ids ids ON td.Id = ids.Id;

Okay, we're done on server side, let's switch to the client.
First thing we need to be able to address User-Defined Table-Types is a class, implementing the IEnumerable<SqlDataRecord>. (Another way would be a DataTable, but I don't like this class due to its huge overhead.)
class IntRecordEnumerator : IEnumerable<SqlDataRecord> {
   public IntRecordEnumerator(IEnumerable<int> values) {
      _values = values;
   }

   IEnumerable<int> _values;

   public IEnumerator<SqlDataRecord> GetEnumerator() {
      SqlMetaData metaData = new SqlMetaData("Id", SqlDbType.Int);

      foreach (int id in _values) {
         SqlDataRecord record = new SqlDataRecord(metaData);
         record.SetInt32(0, id);
         yield return record;
      }
   }

   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
   }
}
Now, we can use this class as parameter of a SqlParameter. (Notice, the parameter requires SqlDbType.Structured and TypeName to address the name of the table type.)
[TestMethod]
public void SearchIds() {
   string sql = "GetTestDataByIds";
   using (SqlConnection connection = CreateOpenConnection())
   using (SqlCommand command = new SqlCommand(sql, connection)) {
      command.CommandType = CommandType.StoredProcedure;
      SqlParameter param = command.Parameters.Add("@Ids", SqlDbType.Structured);
      param.TypeName = "IntIdTable";

      int[] idsToLoad = new int[] { 1, 2, 3, 4, 5 };
      param.Value = new IntRecordEnumerator(idsToLoad);

      using (SqlDataReader reader = command.ExecuteReader()) {
         while (reader.Read()) {
            int id = (int)reader["Id"];
            Console.Write(id);
            Assert.IsTrue(idsToLoad.Contains(id));
         }
      }
   }
}

Streaming Data To The Server

In distributed systems it is important to synchronize data between different databases.

If possible to use database features like BCP or SSIS, this is often the best solution. However, sometimes it is required to do some additional work, containing business logic, those solutions require a duplication of business logic into the database. In addition, if there are data to be updated (not only inserted) the data have to be packed into a staging table before they can be handled by a procedure. Staging tables are a nice temporary storage but can become tricky if there are concurrent synchronization jobs.

The classic and, as far as I know, most common solution today is using single INSERT statements to load those bulk data row by row. As denoted above, this becomes a little stress test for database server and network.

Another, a bit esoteric, approach is to use SQL Server 2005's XML functionality to create one huge XML package, containing all data to be written into database server. Problem is XML is a quiet chatty language and causes a huge memory usage on client and server.

Here, User-Defined Table-Valued types can be used to push all data in one step into a procedure. Business logic can be applied while streaming through all data to be synchronized. The procedure finally updates and inserts the data into their destination table. I first wrote about this approach here.

Again, we start with a table type.
CREATE TYPE ImportSampleTable AS TABLE (
   FirstName VARCHAR(30)
   ,LastName VARCHAR(30)
);
A simple procedure, that gets the type as parameter and inserts the rows into a table.
CREATE PROCEDURE ImportSampleFile
   @importData ImportSampleTable READONLY
AS
   INSERT INTO ImportSample
   SELECT *
   FROM @importData;

Heating to the client...
An implementation of a IEnumerable<T>.
class FileToLoad : IEnumerable<SqlDataRecord> {
   public FileToLoad(string fileName, SqlConnection connection) {
      _fileName = fileName;
      _connection = connection;
   }

   string _fileName;
   SqlConnection _connection;
   SqlMetaData[] _metaData;

   public IEnumerator<SqlDataRecord> GetEnumerator() {
      CreateMetaData();

      using (StreamReader reader = new StreamReader(_fileName)) {
         while (!reader.EndOfStream) {
            SqlDataRecord record = CreateRecord(reader);
            yield return record;
         }
      }
   }

   private void CreateMetaData() {
      _metaData = new SqlMetaData[] {
         new SqlMetaData("FirstName", SqlDbType.VarChar, 30),
         new SqlMetaData("LastName", SqlDbType.VarChar, 30),
      };
   }

   private SqlDataRecord CreateRecord(StreamReader reader) {
      string line = reader.ReadLine();
      SqlDataRecord record = new SqlDataRecord(_metaData);
      string[] lineItems = line.Split('\t');
      record.SetString(0, lineItems[0]);
      record.SetString(1, lineItems[1]);
      return record;
   }

   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
   }
}
And again, we are ready to use the table valued type as parameter for another test.
[TestMethod]
public void StreamingLoad() {
   using (SqlConnection connection = CreateOpenConnection())
   using (SqlCommand command = new SqlCommand("ImportSampleFile", connection)) {
      command.CommandType = CommandType.StoredProcedure;
      SqlParameter param = command.Parameters.Add("@importData", SqlDbType.Structured);
      param.TypeName = "ImportSampleTable";
      // get a sample file
      string fileName = GetType().Assembly.Location;
      fileName = Path.GetDirectoryName(fileName);
      fileName = Path.Combine(fileName, "FileToLoad.txt");
      
      param.Value = new FileToLoad(fileName, connection);
      command.ExecuteNonQuery();
   }
}

Conclusion

User-Defined Table-Valued types a nice new feature of SQL Server 2008! I did not find many reasons to use them in plain T-SQL, but they provide a native support of set-based solutions between client and server and this is a really great enhancement of SQL Server.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.