Friday, September 25, 2009

Numbers table

The proposal for this article was Jeff Moden's great article The "Numbers" or "Tally" Table: What it is and how it replaces a loop. at SQLServerCentral.com. This post should show some additional tasks which can be done with Numbers tables.

A "Numbers" table is a powerful SQL tool. Basically it's just a table containing incremental numbers but this table can be used for many cases, especially to avoid the usage of SQL cursors. Cursors are a common performance problem in many databases. Database systems are optimized for bulk operations, cursors are a row based approach, so you should always try to avoid them.

Let's start with the creation of a Numbers table. The table contains one column and the numbers from 1 to 11000. Why 11000? I don't know :-D. The table can be used for many approaches as we will see in this article. If the table contains too less rows it cannot be used for some tasks, if the table contains too many rows, the selectivity becomes worse. The idea behind 11000 is, 10000 is a common number of "many items", 11000 can handle those tasks and contains 10% puffer for slightly larger operations.

CREATE TABLE Numbers
(
   Num INT NOT NULL PRIMARY KEY CLUSTERED WITH (FILLFACTOR = 100)
);

INSERT INTO Numbers
   SELECT TOP(11000)
      ROW_NUMBER() OVER (ORDER BY (SELECT 1))
   FROM master.sys.all_columns c1
      CROSS JOIN master.sys.all_columns c2;

Well now we have "Numbers", so now we are cool, but why? Let's have a look into some solutions which can be done with a numbers table.

Transform columns into rows
Sometimes it can be needed to transform data of two or more columns into rows. Since this smells like a violation of Normalization Form 1 (NF1) we will never need this for our databases, which are always correctly normalized. Presume we get data from another - not normalized - data source and have to transform those data into our - well normalized - data structure.

We start with a table containing persons and their phone numbers.
DECLARE @BadData TABLE
(
   Id INT NOT NULL IDENTITY
      PRIMARY KEY CLUSTERED
   ,FirstName VARCHAR(30)
   ,LastName VARCHAR(30)
   ,Phone1 VARCHAR(100)
   ,Phone2 VARCHAR(100)
   ,Phone3 VARCHAR(100)
);

INSERT INTO @BadData 
      (FirstName, LastName, Phone1, Phone2, Phone3)
   VALUES
      ('Pete', 'Porter', '1111', '1112', '1113')
      ,('Rush', 'Bush', '2221', '2222', '2223');

Now let's normalize those bad data, using a Numbers table. This can be done by a CROSS JOIN to combine all rows of both tables and a restriction to only three numbers within the WHERE clause.
SELECT
      Id
      ,FirstName
      ,LastName
      ,CASE 
         WHEN Num = 1 THEN Phone1
         WHEN Num = 2 THEN Phone2
         ELSE Phone3
         END AS Phone
      ,Num
   FROM @BadData
      CROSS JOIN Numbers
   WHERE Num <= 3
The result of this statement is shown here.

Id   FirstName  LastName   Phone   Num
---- ---------- ---------- ------- -----------
1    Pete       Porter     1111    1
1    Pete       Porter     1112    2
1    Pete       Porter     1113    3
2    Rush       Bush       2221    1
2    Rush       Bush       2222    2
2    Rush       Bush       2223    3


Build data ranges
Some reports show calculated data on date level. Therefore we have to create a table containing all the dates for a specified date range.
DECLARE
   @start   DATETIME = '2009-01-01'
   ,@end    DATETIME = '2009-02-01'

SELECT
      DATEADD(DAY, Num - 1, @start)
   FROM Numbers
   WHERE
      DATEADD(DAY, Num - 1, @start) < @end

Split strings
Here you can find a dedicated post of me concerning string split methods which also covers how to use a Numbers table to do this:
High Performance String Split Functions

Generate Test Data
Sometimes it's useful to have some test data.

A reason might be the start of a new project. Usually you start with an empty database. Customer and/or analysts did not yet provide real data. Now you can start to write your own row by row or you can use a tool (in our case a script) and let it generate the data. To tell you the truth, for tests of some specific use cases you still have to write the data manually. But to show a grid or aggregate some numbers you might be happy with some scrap data.

On the other side you need an environment with many data for load tests. Feel lucky, if you have a large backup from production but this is not guaranteed. (E.g. you are in a new project, the database is in another location and too large to be copied or you are just not allowed to see the real data for some reason.) For load tests it's almost impossible to write data manually.

Here's a little sample which shows how to use a "Numbers" table to generate hundreds of thousands of rows. As you can see it's relatively simple to configure the output data with T-SQL build in functions like NEWID(), REPLICATE(), ASCII(), NCHAR() or ABS.
SET NOCOUNT ON;
GO
---==============================================
-- table TestDataFR

-- drop an existing test table
IF (OBJECT_ID('TestDataFR') IS NOT NULL)
   DROP TABLE TestDataFR;
-- create a new table
CREATE TABLE TestDataFR
(
   Id INT NOT NULL IDENTITY(1,1)
      PRIMARY KEY CLUSTERED
   ,SomeInt INT
   ,SomeVarChar VARCHAR(50)
   ,SomeNVarChar NVARCHAR(50)
   ,SomeDateTime DATETIME
   ,SomeDecimal DECIMAL(15,5)
   ,SomeFloat FLOAT
);
GO

---==============================================
-- configuration
DECLARE
   @count            INT            = 100000
   ,@done            INT            = 0
   ,@batchSize       INT            = 10000
   ,@top             INT            = 0
   ,@maxLenNVarChar  INT            = 15
   ,@maxSomeDecimal  DECIMAL(15,5)  = 800000000
   ,@maxSomeFloat    BIGINT         = 900000000000000
   ;


-- get the current count of rows (should be zero)
SELECT @done = COUNT(*)
   FROM TestDataFR

-- create the test data
WHILE (@done < @count)
BEGIN
   -- work in batch size transactions
   BEGIN TRANSACTION;
   
   -- determine the count of rows for this batch
   SELECT @top = 
      CASE 
         WHEN @count - @done < @batchSize THEN @count - @done 
         ELSE @batchSize 
         END;

   -- insert the rows
   INSERT INTO TestDataFR (
         SomeInt
         ,SomeVarChar
         ,SomeNVarChar
         ,SomeDateTime
         ,SomeDecimal
         ,SomeFloat
         )
      SELECT TOP(@top)
         SomeInt        = Num
         ,SomeVarChar   = CONVERT(VARCHAR(50), NEWID())
         ,SomeNVarChar  = 
            CONVERT
            (
               -- return type
               NVARCHAR(50), 
               -- replicate a char between A and Z up to 
               -- max configured length
               REPLICATE
               (
                  NCHAR(ASCII('a') + Num % 26), 
                  Num % @maxLenNVarChar + 1
               ) 
               -- add a space
               + N' ' 
               -- add a number at the end
               + CONVERT(NVARCHAR(10), Num)
            )
         ,SomeDateTime  = DATEADD(HOUR, Num * -1, GETDATE())
         ,SomeDecimal   = 
            CONVERT
            (
               DECIMAL(15,5), 
               -- a non negative number up to the max devided by Num
               ABS(CHECKSUM(NEWID()) % @maxSomeDecimal) / Num
            )
         ,SomeFloat     = 
            -- a possibly negative number up to specified max
            CONVERT(FLOAT, CHECKSUM(NEWID()) % @maxSomeFloat) 
            -- devide by 1.xxx 
            / (1 + (CONVERT(FLOAT, 1) / Num))
      FROM Numbers
   
   -- calculate the count of done rows
   SELECT @done += @@ROWCOUNT;
   
   COMMIT TRANSACTION;
END;
GO

---==============================================
-- show result and some of the data

SELECT 'Count of rows: ' + CONVERT(VARCHAR(20), COUNT(*))
   FROM TestDataFR;

SELECT TOP(100)
      'Samples'
      ,*
   FROM TestDataFR;
Binary data as hex-string
The transformation of binary data into a hexadecimal string can be used to export data into SQL statements or to export the data to any other destination.
IF (OBJECT_ID('ufn_VarBinToHexString') IS NULL)
   EXECUTE ('CREATE FUNCTION ufn_VarBinToHexString() 
             RETURNS INT AS BEGIN RETURN 1 END');
GO
ALTER FUNCTION ufn_VarBinToHexString(@binary VARBINARY(MAX))
   RETURNS VARCHAR(MAX)
AS
BEGIN
   --DECLARE @binary VARBINARY(100) = NEWID();

   -- Mapping for binary values
   DECLARE @HexMap VARCHAR(550)
   SELECT @HexMap = 
       '000102030405060708090A0B0C0D0E0F101112131415161718191A1B1C1D1E1F'
     + '202122232425262728292A2B2C2D2E2F303132333435363738393A3B3C3D3E3F' 
     + '404142434445464748494A4B4C4D4E4F505152535455565758595A5B5C5D5E5F'
     + '606162636465666768696A6B6C6D6E6F707172737475767778797A7B7C7D7E7F' 
     + '808182838485868788898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F'
     + 'A0A1A2A3A4A5A6A7A8A9AAABACADAEAFB0B1B2B3B4B5B6B7B8B9BABBBCBDBEBF' 
     + 'C0C1C2C3C4C5C6C7C8C9CACBCCCDCECFD0D1D2D3D4D5D6D7D8D9DADBDCDDDEDF'
     + 'E0E1E2E3E4E5E6E7E8E9EAEBECEDEEEFF0F1F2F3F4F5F6F7F8F9FAFBFCFDFEFF'
     ;

   RETURN 
      '0x' 
      + 
      (
      SELECT
         SUBSTRING(
            @HexMap
            ,CONVERT(
               TINYINT
               ,SUBSTRING(@binary, Num, 1)) * 2 + 1
            ,2)
      FROM Numbers t2
      WHERE Num <= DATALENGTH(@binary)
      ORDER BY Num
      FOR XML PATH('')
      );
END
GO
The numbers table is used to loop through all bytes within the specified binary stream. SUBSTRING is used to extract the current byte. Now the @HexMap is used to get the characters which represent the current byte. The result of this is a table with one row for each byte within the specified binary value. Finally the are concatenated by a common hack with FOR XML PATH(''). This function can be helpful for SQL Server 2005, in SQL Server 2008 it became obsolete. The build-in CONVERT function has been extended with an optional style-parameter which can be used to generate the same result. The following sample shows the usage of this function and the new CONVERT feature of SQL Server 2008.
DECLARE @binary VARBINARY(100) = NEWID();
SELECT
   @binary
   ,dbo.ufn_VarBinToHexString(@binary)
   ,CONVERT(VARCHAR(MAX), @binary, 1)
Conclusion As you saw, a "Numbers" table can be used for many different things. If you start using it you will find more and more tasks which can be optimized with it.

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.