Monday, November 16, 2009

Table-Valued Parameters - A Performance Comparison

As I wrote in some of my previous posts, table-valued parameters - which have been introduced with SQL Server 2008 - appear to be a great feature to enable client side bulk operations when working with SQL Server. They enable client data provider like ADO.NET to send a complete set of data to the server instead of calling serveral single row operations. Today we'll do some comparsions.

I'll start with a short (had to correct this...) description of the test environment. To use the best possible performance for both ways I'll use stored procedures.

The first test results focus the performance of single statements vs. bulk operations in a single client environemnt - means without any other concurrent activities. The second, more important tests cover a performance comparsion in a multi-client scenario. The last part will show a comparison for some larger bulk operations.

Test Environment

If you already read my previous posts you can skip this section. I try to keep this post uncoupled from my previous post since this covers another general topic (TVPs). So I have to repreat the test environment and the general way to use table-valued parameters from ADO.NET.

For my tests I'll used the following simple table to INSERT/UPDATE rows.

CREATE TABLE TestMany
(
   Id int IDENTITY(1,1) NOT NULL
      PRIMARY KEY CLUSTERED 
   ,SomeInt int NULL
   ,SomeDate datetime NULL
   ,SomeText varchar(100) NULL,
);
Whenever we intend to use table-valued parameters in SQL Server 2008, we need a table-valued type defined in SQL Server which can be used as a collection of Data Transfer Objects (DTOs) to transfer .NET native data to SQL Server.
CREATE TYPE TestManyMergeType AS TABLE(
   Id int NOT NULL
   ,Action char(1) NOT NULL
      CHECK (Action IN ('I', 'D', 'U'))
   ,SomeInt int NULL
   ,SomeDate datetime NULL
   ,SomeText varchar(100) NULL
   
   ,PRIMARY KEY CLUSTERED 
      (Id, Action)
);
This type covers our destination table with some slight exceptions. Since we will use table-valued parameters to push all changed client data in one package to SQL Server, we need a column which tells SS2k8 how to handle a distinct row. Therefore, we'll use the Action column which specifies the current row to be Inserted, Updated or Deleted. We don't use an IDENTITY specification for our "Id" column since we need to be able to specify an destination "Id" to be handled. To insert new rows we can use negative, unique "Id" values which are required to map newly inserted rows on client-side to their new server ids. To get the best selectivity we use a primary key over the "Id" and "Action" columns.

To provide a stored procedure which gets a table-valued parameter, the parameter has to be specified as READONLY in SS2k8.

CREATE PROCEDURE SaveTestMany
   @data TestManyMergeType READONLY
AS
   SET NOCOUNT ON;
   
   -- update
   UPDATE trg SET 
         SomeInt = src.SomeInt
         ,SomeDate = src.SomeDate
         ,SomeText = src.SomeText
      FROM TestMany trg 
         JOIN @data src 
         ON trg.Id = src.Id AND src.Action = 'U';

   -- insert
   INSERT INTO TestMany
      SELECT SomeInt,SomeDate,SomeText 
      FROM @data 
      WHERE Action = 'I' 
      ORDER BY Id

   -- return last IDENTITY and count of added rows
   SELECT SCOPE_IDENTITY(), @@ROWCOUNT;
GO
The important most important part of this procedure is the ORDER BY clause in the INSERT statement. This ensures that we can use the information about the last generated identity value (by SCOPE_IDENTITY()) and the count of inserted rows (by @@ROWCOUNT) to map the provided surrogate ids to the generated server ids. Maybe you think this might be dangerous without a MAXDOP option or anything else and maybe you are right, I just can tell you that I tried different order conditions and all rows have always been inserted in an exact order. So feel free to trust this or not. If you don't trust it (what is a good thing) do your own tests (what is a good thing, too) or just feel free to leave this post (what would be a really bad thing).

To provide a table-valued parameter to a stored procedure from ADO.NET you have two possibilities. You can either provide your data by passing a System.Data.DataTable (which appears be too blown just to provide simple data records in my opinion) or a custom implementation of an IEnumerable<SqlDataRecord>. I'll use the enumerator in my tests since it is way faster and does not require to copy all domain objects into a data table.

class TestManyEnumerator : IEnumerable<SqlDataRecord>
{
   public TestManyEnumerator(IEnumerable<TestMany> data) {
      _data = data;
   }

   private IEnumerable<TestMany> _data;

   public IEnumerator<SqlDataRecord> GetEnumerator() {
      SqlMetaData[] metaData = {
         new SqlMetaData("Id", SqlDbType.Int),
         new SqlMetaData("MergeAction", SqlDbType.Char, 1),
         new SqlMetaData("SomeInt", SqlDbType.Int),
         new SqlMetaData("SomeDate", SqlDbType.DateTime),
         new SqlMetaData("SomeText", SqlDbType.VarChar, 100),
      };

      foreach (var item in _data) {
         SqlDataRecord record = new SqlDataRecord(metaData);

         record.SetValue(0, item.Id);
         record.SetValue(1, item.ChangeState);
         record.SetValue(2, item.SomeInt);
         record.SetValue(3, item.SomeDate);
         record.SetValue(4, item.SomeText);

         yield return record;
      }
   }

   System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() {
      return this.GetEnumerator();
   }
}
This enumerator handles domain objects from type TestMany which map exactly to our above specified table-valued T-SQL type. ChangeState specifies the state ('I', 'D', 'U') of a particular object.

To call our procedure with a set of data to be updated we have to specify a SqlParameter with SqlDbType.Structured, the name of the destination table-valued type as TypeName and provide the previous enumerator as Value.

using (SqlConnection cn = GetConnection(true))
using (SqlTransaction tran = cn.BeginTransaction()) {
   // get test data
   IList<TestMany> data = GetTestData();

   // create a new command as stored procedure
   SqlCommand cmd = new SqlCommand("MergeTestMany", cn);
   cmd.CommandType = CommandType.StoredProcedure;
   cmd.Transaction = tran;
   // create a new parameter ("Structured" defines a TVP)
   SqlParameter p = cmd.Parameters.Add("@data", SqlDbType.Structured);
   // specify the name of the TVP
   p.TypeName = "TestManyMergeType";
   p.Value = new TestManyEnumerator(data);

   // catch the returned IDENTITY information
   using (SqlDataReader reader = cmd.ExecuteReader()) {
      reader.Read();
      int lastId = reader.IsDBNull(0) ? -1 : (int)reader.GetDecimal(0);
      int count = reader.GetInt32(1);
      // synchronize all newly added keys
      SyncAddedIds(data, count, lastId);
   }

   tran.Commit();
}
For a more detailed explanatin and the implementation of GetTestData():IList<TestMany> and GetConnection(open):SqlConnection please have a look here. For a generic approach, have a look at A Generic BDP Approach for SQL Server 2008.

The procedures to insert single rows into our TestMany table are just simple as they can be.

CREATE PROCEDURE [dbo].[InsertOneTestMany]
   @id INT OUTPUT
   ,@someInt INT
   ,@someDate DATETIME
   ,@someText VARCHAR(100)
AS
   SET NOCOUNT ON;
   
   INSERT INTO TestMany (SomeInt, SomeDate, SomeText)
      SELECT @someInt, @someDate, @someText;
   
   SELECT @id = SCOPE_IDENTITY();
GO

CREATE PROCEDURE [dbo].[UpdateOneTestMany]
   @id INT
   ,@someInt INT
   ,@someDate DATETIME
   ,@someText VARCHAR(100)
AS
   SET NOCOUNT ON;
   
   UPDATE TestMany SET 
         SomeInt = @someInt
         ,SomeDate = @someDate
         ,SomeText = @someText
      WHERE Id = @id;
GO
I think there is nothing which needs to be explained.

We're done with our test enviroment.

Not part of this Post

Calling a single statement procedure from .NET straight forward and part of many tutorials. Therefore I kept this source code away from here. I also don't show how to use multi-threading to call the above procedures in a concurrency scenario.

I kept DELETE operations out of the party since they are often done by special purge processes instead of usual business applications.

Since I don't have a home network, I did my tests with a SQL Server running on a VM on my local host. Therefore I cannot show you any comparison differences depending on network latency.

Single Client Tests

Let's compare the performance of single INSERT/UPDATE statements vs. bulk operations when using a single client connection. To get (almost) linear test results I executed each test configuration 1,000 times. One important thing is that I always called the specified count of INSERT and UPDATE operations in one round trip. So, "Count Insert/Update" with value "1" calls one INSERT and one UPDATE operation, count "50" affects hundred rows. I did my tests with a destination table with just 100,000 rows and witha a table with 10,000,000 rows. Both table have been recreated after/before each test configuration and flushed by calling "DBCC DROPCLEANBUFFERS() WITH NO_INFOMSGS;". To ensure as many index read operations as possible I arranged all operations over the complete complete id range of the table.

Duration on a destination table with 100,000 rows after 1,000 executions.

Insert/UpdateDuration
CountSingle StatementsBulk Operation
11,299 ms3,285 ms
32,319 ms2,995 ms
53,201 ms3,007 ms
105,553 ms3,121 ms
2011,923 ms3,449 ms
5027,644 ms4,769 ms
Duration on a destination table with 10,000,000 rows after 1,000 executions.

Insert/UpdateDuration
CountSingle StatementsBulk Operation
12,631 ms4,076 ms
33,614 ms4,200 ms
54,608 ms4,470 ms
107,541 ms4,381 ms
2012,308 ms4,846 ms
5026,173 ms6,336 ms
As you can see, there is a point at about five INSERT/UPDATE operations (means 10 rows!) in one single transaction. With less than ten operations for the same table single statements are faster than bulk operations with table-valued parameters. If we hit this count of rows to be changed, table-valued parameters become way faster than classical single row operations.

A work load between one and fifty INSERT and UPDATE tasks shows that single row operations take about 10 times longer (what's a good thing), the duration of bulk operations increase only about 1.5 times (what appears to be a better thing ;-) ).

Multi-Client Tests

As initially written. SS2k8 is a server and no geeky feature has any worth if it works only in a single-client scenario. Therefore, now we'll have a look at some multi-client tests.

Since concurrent executions take more time I ran the following tests only 100 times. I also changed the count of rows within the destination table to 500,000 what was a convenient value to arrange the threads over the whole table - to get the highest count of reads).

The following results show the overall duration of 100 executions.

Insert/UpdateDuration
CountSingle StatementsBulk Operation
1 Thread
1159 ms298 ms
3237 ms358 ms
5344 ms371 ms
10562 ms391 ms
201,040 ms427 ms
502457 ms558 ms
2 Threads
1387 ms348 ms
3679 ms368 ms
5854 ms438 ms
101,597 ms421 ms
203,513 ms575 ms
506,839 ms830 ms
5 Threads
11,466 ms936 ms
32,528 ms943 ms
53,542 ms1,007 ms
105,837 ms1,097 ms
2010,995 ms1,293 ms
5026,092 ms2,251 ms
10 Threads
14,820 ms1,604 ms
313,290 ms1,936 ms
512,624 ms1,838 ms
1028,829 ms2,012 ms
2045,414 ms2,789 ms
5097,498 ms4,610 ms
Again, remember "1" INSERT/UPDATE means two operations within one transaction. Since single row operations usually scale almost linear they are still the faster solution for single operations. However, as we see bulk operations using with table-valued parameters perform really fine in a multi-threaded environment.

Some Larger Bulks

Okay, we speak about bulk operations so finally, let's do some real bulk operations.

The following results show the complete duration of 10 executions.

Insert/UpdateDuration
CountSingle StatementsBulk Operation
1,0004,845 ms370 ms
5,00023,190 ms2,033 ms
10,00046,539 ms4,854 ms
20,00095,086 ms7,209 ms
In my opinion 400,000 changed rows (remember we did 20,000 INSERT and UPDATE operations and we executed 10 times!) in 7 seconds appear to be a really great performance. ;-)

Conclusion

Apparently table-valued parameters might be a great feature for systems which work with bulk data manipulation. This might be some web or rich client applications but especially backend processes like data loader or synchronization processes.

What does this mean? Probably nothing for a general statement. If you think bulk operations with table-valued parameters might be a nice feature for your project. Always do your own tests! We did no tests with a real network and we did not test DELETE operations.

As a general advice, always try to stay innovative but validate everything. My intention was not to show you that you have to change all your running code, but to think about some new features which might be helpful.

1 comment:

  1. Regarding identity orders, I found this interesting:

    INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted

    http://blogs.msdn.com/b/sqltips/archive/2005/07/20/441053.aspx

    ReplyDelete

Note: Only a member of this blog may post a comment.