I'll start with a
The first test results focus the performance of single statements vs. bulk operations in a single client environemnt - means without any other concurrent activities. The second, more important tests cover a performance comparsion in a multi-client scenario. The last part will show a comparison for some larger bulk operations.
Test Environment
If you already read my previous posts you can skip this section. I try to keep this post uncoupled from my previous post since this covers another general topic (TVPs). So I have to repreat the test environment and the general way to use table-valued parameters from ADO.NET.For my tests I'll used the following simple table to INSERT/UPDATE rows.
CREATE TABLE TestMany ( Id int IDENTITY(1,1) NOT NULL PRIMARY KEY CLUSTERED ,SomeInt int NULL ,SomeDate datetime NULL ,SomeText varchar(100) NULL, );Whenever we intend to use table-valued parameters in SQL Server 2008, we need a table-valued type defined in SQL Server which can be used as a collection of Data Transfer Objects (DTOs) to transfer .NET native data to SQL Server.
CREATE TYPE TestManyMergeType AS TABLE( Id int NOT NULL ,Action char(1) NOT NULL CHECK (Action IN ('I', 'D', 'U')) ,SomeInt int NULL ,SomeDate datetime NULL ,SomeText varchar(100) NULL ,PRIMARY KEY CLUSTERED (Id, Action) );This type covers our destination table with some slight exceptions. Since we will use table-valued parameters to push all changed client data in one package to SQL Server, we need a column which tells SS2k8 how to handle a distinct row. Therefore, we'll use the Action column which specifies the current row to be Inserted, Updated or Deleted. We don't use an IDENTITY specification for our "Id" column since we need to be able to specify an destination "Id" to be handled. To insert new rows we can use negative, unique "Id" values which are required to map newly inserted rows on client-side to their new server ids. To get the best selectivity we use a primary key over the "Id" and "Action" columns.
To provide a stored procedure which gets a table-valued parameter, the parameter has to be specified as READONLY in SS2k8.
CREATE PROCEDURE SaveTestMany @data TestManyMergeType READONLY AS SET NOCOUNT ON; -- update UPDATE trg SET SomeInt = src.SomeInt ,SomeDate = src.SomeDate ,SomeText = src.SomeText FROM TestMany trg JOIN @data src ON trg.Id = src.Id AND src.Action = 'U'; -- insert INSERT INTO TestMany SELECT SomeInt,SomeDate,SomeText FROM @data WHERE Action = 'I' ORDER BY Id -- return last IDENTITY and count of added rows SELECT SCOPE_IDENTITY(), @@ROWCOUNT; GOThe important most important part of this procedure is the ORDER BY clause in the INSERT statement. This ensures that we can use the information about the last generated identity value (by SCOPE_IDENTITY()) and the count of inserted rows (by @@ROWCOUNT) to map the provided surrogate ids to the generated server ids. Maybe you think this might be dangerous without a MAXDOP option or anything else and maybe you are right, I just can tell you that I tried different order conditions and all rows have always been inserted in an exact order. So feel free to trust this or not. If you don't trust it (what is a good thing) do your own tests (what is a good thing, too) or just feel free to leave this post (what would be a really bad thing).
To provide a table-valued parameter to a stored procedure from ADO.NET you have two possibilities. You can either provide your data by passing a System.Data.DataTable (which appears be too blown just to provide simple data records in my opinion) or a custom implementation of an IEnumerable<SqlDataRecord>. I'll use the enumerator in my tests since it is way faster and does not require to copy all domain objects into a data table.
class TestManyEnumerator : IEnumerable<SqlDataRecord> { public TestManyEnumerator(IEnumerable<TestMany> data) { _data = data; } private IEnumerable<TestMany> _data; public IEnumerator<SqlDataRecord> GetEnumerator() { SqlMetaData[] metaData = { new SqlMetaData("Id", SqlDbType.Int), new SqlMetaData("MergeAction", SqlDbType.Char, 1), new SqlMetaData("SomeInt", SqlDbType.Int), new SqlMetaData("SomeDate", SqlDbType.DateTime), new SqlMetaData("SomeText", SqlDbType.VarChar, 100), }; foreach (var item in _data) { SqlDataRecord record = new SqlDataRecord(metaData); record.SetValue(0, item.Id); record.SetValue(1, item.ChangeState); record.SetValue(2, item.SomeInt); record.SetValue(3, item.SomeDate); record.SetValue(4, item.SomeText); yield return record; } } System.Collections.IEnumerator System.Collections.IEnumerable.GetEnumerator() { return this.GetEnumerator(); } }This enumerator handles domain objects from type TestMany which map exactly to our above specified table-valued T-SQL type. ChangeState specifies the state ('I', 'D', 'U') of a particular object.
To call our procedure with a set of data to be updated we have to specify a SqlParameter with SqlDbType.Structured, the name of the destination table-valued type as TypeName and provide the previous enumerator as Value.
using (SqlConnection cn = GetConnection(true)) using (SqlTransaction tran = cn.BeginTransaction()) { // get test data IList<TestMany> data = GetTestData(); // create a new command as stored procedure SqlCommand cmd = new SqlCommand("MergeTestMany", cn); cmd.CommandType = CommandType.StoredProcedure; cmd.Transaction = tran; // create a new parameter ("Structured" defines a TVP) SqlParameter p = cmd.Parameters.Add("@data", SqlDbType.Structured); // specify the name of the TVP p.TypeName = "TestManyMergeType"; p.Value = new TestManyEnumerator(data); // catch the returned IDENTITY information using (SqlDataReader reader = cmd.ExecuteReader()) { reader.Read(); int lastId = reader.IsDBNull(0) ? -1 : (int)reader.GetDecimal(0); int count = reader.GetInt32(1); // synchronize all newly added keys SyncAddedIds(data, count, lastId); } tran.Commit(); }For a more detailed explanatin and the implementation of GetTestData():IList<TestMany> and GetConnection(open):SqlConnection please have a look here. For a generic approach, have a look at A Generic BDP Approach for SQL Server 2008.
The procedures to insert single rows into our TestMany table are just simple as they can be.
CREATE PROCEDURE [dbo].[InsertOneTestMany] @id INT OUTPUT ,@someInt INT ,@someDate DATETIME ,@someText VARCHAR(100) AS SET NOCOUNT ON; INSERT INTO TestMany (SomeInt, SomeDate, SomeText) SELECT @someInt, @someDate, @someText; SELECT @id = SCOPE_IDENTITY(); GO CREATE PROCEDURE [dbo].[UpdateOneTestMany] @id INT ,@someInt INT ,@someDate DATETIME ,@someText VARCHAR(100) AS SET NOCOUNT ON; UPDATE TestMany SET SomeInt = @someInt ,SomeDate = @someDate ,SomeText = @someText WHERE Id = @id; GOI think there is nothing which needs to be explained.
We're done with our test enviroment.
Not part of this Post
Calling a single statement procedure from .NET straight forward and part of many tutorials. Therefore I kept this source code away from here. I also don't show how to use multi-threading to call the above procedures in a concurrency scenario.I kept DELETE operations out of the party since they are often done by special purge processes instead of usual business applications.
Since I don't have a home network, I did my tests with a SQL Server running on a VM on my local host. Therefore I cannot show you any comparison differences depending on network latency.
Single Client Tests
Let's compare the performance of single INSERT/UPDATE statements vs. bulk operations when using a single client connection. To get (almost) linear test results I executed each test configuration 1,000 times. One important thing is that I always called the specified count of INSERT and UPDATE operations in one round trip. So, "Count Insert/Update" with value "1" calls one INSERT and one UPDATE operation, count "50" affects hundred rows. I did my tests with a destination table with just 100,000 rows and witha a table with 10,000,000 rows. Both table have been recreated after/before each test configuration and flushed by calling "DBCC DROPCLEANBUFFERS() WITH NO_INFOMSGS;". To ensure as many index read operations as possible I arranged all operations over the complete complete id range of the table.Duration on a destination table with 100,000 rows after 1,000 executions.
Insert/Update | Duration | |
---|---|---|
Count | Single Statements | Bulk Operation |
1 | 1,299 ms | 3,285 ms |
3 | 2,319 ms | 2,995 ms |
5 | 3,201 ms | 3,007 ms |
10 | 5,553 ms | 3,121 ms |
20 | 11,923 ms | 3,449 ms |
50 | 27,644 ms | 4,769 ms |
Insert/Update | Duration | |
---|---|---|
Count | Single Statements | Bulk Operation |
1 | 2,631 ms | 4,076 ms |
3 | 3,614 ms | 4,200 ms |
5 | 4,608 ms | 4,470 ms |
10 | 7,541 ms | 4,381 ms |
20 | 12,308 ms | 4,846 ms |
50 | 26,173 ms | 6,336 ms |
A work load between one and fifty INSERT and UPDATE tasks shows that single row operations take about 10 times longer (what's a good thing), the duration of bulk operations increase only about 1.5 times (what appears to be a better thing ;-) ).
Multi-Client Tests
As initially written. SS2k8 is a server and no geeky feature has any worth if it works only in a single-client scenario. Therefore, now we'll have a look at some multi-client tests.Since concurrent executions take more time I ran the following tests only 100 times. I also changed the count of rows within the destination table to 500,000 what was a convenient value to arrange the threads over the whole table - to get the highest count of reads).
The following results show the overall duration of 100 executions.
Insert/Update | Duration | |
---|---|---|
Count | Single Statements | Bulk Operation |
1 Thread | ||
1 | 159 ms | 298 ms |
3 | 237 ms | 358 ms |
5 | 344 ms | 371 ms |
10 | 562 ms | 391 ms |
20 | 1,040 ms | 427 ms |
50 | 2457 ms | 558 ms |
2 Threads | ||
1 | 387 ms | 348 ms |
3 | 679 ms | 368 ms |
5 | 854 ms | 438 ms |
10 | 1,597 ms | 421 ms |
20 | 3,513 ms | 575 ms |
50 | 6,839 ms | 830 ms |
5 Threads | ||
1 | 1,466 ms | 936 ms |
3 | 2,528 ms | 943 ms |
5 | 3,542 ms | 1,007 ms |
10 | 5,837 ms | 1,097 ms |
20 | 10,995 ms | 1,293 ms |
50 | 26,092 ms | 2,251 ms |
10 Threads | ||
1 | 4,820 ms | 1,604 ms |
3 | 13,290 ms | 1,936 ms |
5 | 12,624 ms | 1,838 ms |
10 | 28,829 ms | 2,012 ms |
20 | 45,414 ms | 2,789 ms |
50 | 97,498 ms | 4,610 ms |
Some Larger Bulks
Okay, we speak about bulk operations so finally, let's do some real bulk operations.The following results show the complete duration of 10 executions.
Insert/Update | Duration | |
---|---|---|
Count | Single Statements | Bulk Operation |
1,000 | 4,845 ms | 370 ms |
5,000 | 23,190 ms | 2,033 ms |
10,000 | 46,539 ms | 4,854 ms |
20,000 | 95,086 ms | 7,209 ms |
Conclusion
Apparently table-valued parameters might be a great feature for systems which work with bulk data manipulation. This might be some web or rich client applications but especially backend processes like data loader or synchronization processes.What does this mean? Probably nothing for a general statement. If you think bulk operations with table-valued parameters might be a nice feature for your project. Always do your own tests! We did no tests with a real network and we did not test DELETE operations.
As a general advice, always try to stay innovative but validate everything. My intention was not to show you that you have to change all your running code, but to think about some new features which might be helpful.
Regarding identity orders, I found this interesting:
ReplyDeleteINSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted
http://blogs.msdn.com/b/sqltips/archive/2005/07/20/441053.aspx