c# - Slow insert performance with large amounts of data (SQL Server / C#)

Question

I'm working with electronic equipment that digitizes waveforms in real-time (each device generates around 1000 512 byte arrays per second - we have 12 devices). I've written a client for these devices in C# that for the most part works fine and has no performance issues.

However, one of the requirements for the application is archival, and Microsoft SQL Server 2010 was mandated as the storage mechanism (outside of my control). The database layout is very simple: there is one table per device per day ("Archive_Dev02_20131015" etc). Each table has an Id column, a timestamp column, a Data column (varbinary) and 20 more integer columns with some metadata. There's a clustered primary key on Id and timestamp, and another separate index on timestamp. My naive approach was queue all data in the client application, and then inserting everything into the database in 5 second intervals using SqlCommand.

The basic mechanism looks like this:

using (SqlTransaction transaction = connection.BeginTransaction()
{
    //Beginning of the insert sql statement...
    string sql = "USE [DatabaseName]\r\n" +
                 "INSERT INTO [dbo].[Archive_Dev02_20131015]\r\n" + 
                 "(\r\n" +
                 "   [Timestamp], \r\n" +
                 "   [Data], \r\n" +
                 "   [IntField1], \r\n" +
                 "   [...], \r\n" +                         
                 ") \r\n" +
                 "VALUES \r\n" +
                 "(\r\n" +
                 "   @timestamp, \r\n" + 
                 "   @data, \r\n" + 
                 "   @int1, \r\n" +
                 "   @..., \r\n" +  
                 ")";

    using (SqlCommand cmd = new SqlCommand(sql))
    {
        cmd.Connection = connection;
    cmd.Transaction = transaction;

    cmd.Parameters.Add("@timestamp", System.Data.SqlDbType.DateTime);
    cmd.Parameters.Add("@data", System.Data.SqlDbType.Binary);
    cmd.Parameters.Add("@int1", System.Data.SqlDbType.Int);

    foreach (var sample in samples)
    {
            cmd.Parameters[0].Value = amples.ReceiveDate;
            cmd.Parameters[1].Value = samples.Data;       //Data is a byte array
            cmd.Parameters[1].Size  = samples.Data.Length;
            cmd.Parameters[2].Value = sample.IntValue1;
             ...

            int affected = cmd.ExecuteNonQuery();

            if (affected != 1)
            {
                throw new Exception("Could not insert sample into the database!");
            }
          }
       }
   }

   transaction.Commit();                
}

To summarize: a batch of 1 transaction with a loop that generates insert statements and executes them.

This method turned out to be very, very slow. On my machine (i5-2400 @ 3.1GHz, 8GB RAM, using .NET 4.0 and SQL Server 2008, 2 internal HDs in mirror, everything runs locally), it takes about 2,5 seconds to save the data from 2 devices, so saving 12 devices each 5 seconds is impossible.

To compare, I've written a small SQL script (actually I extracted the code C# runs with the sql server profiler) that does the same directly on the server (still running on my own machine):

set statistics io on
go

begin transaction
go

declare @i int = 0;

while @i < 24500 begin
SET @i = @i + 1

exec sp_executesql N'USE [DatabaseName]                                                                
INSERT INTO [dbo].[Archive_Dev02_20131015]                                                      
(                                                                                      
   [Timestamp],                                                                        
   [Data],                                                                             
   [int1],                                                                       
    ...                                                    
   [int20]                                                                                
)                                                                                      
VALUES                                                                                 
(                                                                                      
   @timestamp,                                                                         
   @data,                                                                              
   @compressed,                                                                        
   @int1,                                                                           
   ...                                                                  
   @int20,                                                                   

)',N'@timestamp datetime,@data binary(118),@int1 int,...,@int20 int,',
@timestamp='2013-10-14 14:31:12.023',
@data=0xECBD07601C499625262F6DCA7B7F4AF54AD7E074A10880601324D8904010ECC188CDE692EC1D69472329AB2A81CA6556655D661640CCED9DBCF7DE7BEFBDF7DE7BEFBDF7BA3B9D4E27F7DFFF3F5C6664016CF6CE4ADAC99E2180AAC81F3F7E7C1F3F22FEEF5FE347FFFDBFF5BF1FC6F3FF040000FFFF,
@int=0,
...
@int20=0
end

commit transaction

This does (imo, but I'm probably wrong ;) ) the same thing, only this time I'm using 24500 iterations, to simulate the 12 devices at once. The query takes about 2 seconds. If I use the same amount of iterations as the C# version, the query runs in less than a second.

So my first question is: why does it run way faster on SQL server than in C#? Does this have anything to do with the connection (local tcp)?

To make matters more confusing (to me) this code runs twice as slow on the production server (IBM bladecenter, 32GB ram, fiber connection to SAN, ... filesystem operations are really fast). I've tried looking at the sql activity monitor and write performance never goes above 2MB/sec, but this might as well be normal. I'm a complete newbie to sql server (about the polar opposite of a competent DBA in fact).

Any ideas on how I can make the C# code more performant?

score 12 · Accepted Answer

到目前为止，加载此类数据的最佳方法是使用表值参数和获取数据的存储过程。使用它的表类型和过程的一个非常简单的示例是：

CREATE TYPE [dbo].[StringTable]
AS TABLE ([Value] [nvarchar] (MAX) NOT NULL)
GO

CREATE PROCEDURE [dbo].[InsertStrings]
  @Paths [dbo].[StringTable] READONLY
AS
INSERT INTO [dbo].[MyTable] ([Value])
SELECT [Value] FROM @Paths
GO

然后 C# 代码将类似于以下内容（请记住，我已将其输入到 S/O 编辑器中，因此可能存在拼写错误）：

private static IEnumerable<SqlDataRecord> TransformStringList(ICollection<string> source)
{
     if (source == null || source.Count == 0)
     {
         return null;
     }
     return GetRecords(source, 
                       () => new SqlDataRecord(new SqlMetaData("Value", SqlDbType.NVarChar, -1)), 
                       (record, value) => record.SetString(0, value));
}

private static IEnumerable<SqlDataRecord> GetRecords<T>(IEnumerable<T> source, Func<SqlDataRecord> factory, Action<SqlDataRecord, T> hydrator)
{
    SqlDataRecord dataRecord = factory();
    foreach (var value in source)
    {
        hydrator(dataRecord, value);
        yield return dataRecord;
    }
}

private InsertStrings(ICollection<string> strings, SqlConnection connection)
{
    using (var transaction = connection.BeginTransaction())
    {
        using (var cmd = new SqlCommand("dbo.InsertStrings"))
        {
            cmd.Connection = connection;
            cmd.Transaction = transaction;
            cmd.CommandType = CommandType.StoredProcedure;
            cmd.Parameters.Add(new SqlParameter("@Paths", SqlDbType.Structured) { Value = TransformStringList(strings) };
            cmd.ExecuteNonQuery();
        }
    }
}

这种方法具有可与竞争对手媲美的速度SqlBulkCopy，但它还可以通过运行您通过过程更新的内容的能力产生更好的控制，并且还可以更轻松地处理并发性。

编辑 -> 出于完整性考虑，此方法适用于 SQL Server 2008 及更高版本。鉴于没有 SQL Server 2010 这样的东西，我想我最好提一下。

score 2 · Accepted Answer

在 sql server 中，

CREATE TYPE [dbo].[ArchiveData]
AS TABLE (
    [Timestamp] [DateTime] NOT NULL,
    [Data] [VarBinary](MAX) NOT NULL,
    [IntField1] [Int] NOT NULL,
    [...] [Int] NOT NULL,
    [IntField20] NOT NULL)
GO

那么您的代码应该类似于下面的代码。此代码使用表值参数一次插入所有待处理数据，是单个事务。

请注意省略了缓慢和不必要USE DATABASE的内容，并使用逐字字符串 ( @"") 使代码更具可读性。

// The insert sql statement.
string sql =
@"INSERT INTO [dbo].[Archive_Dev02_20131015] (
    [Timestamp],
    [Data],
    [IntField1],
    [...],                         
    [IntField20])
 SELECT * FROM @data;";

using (SqlCommand cmd = new SqlCommand(sql))
{
    using (SqlTransaction transaction = connection.BeginTransaction()
    {
        cmd.Connection = connection;
        cmd.Transaction = transaction;
        cmd.Parameters.Add(new SqlParameter("@data", SqlDbType.Structured)
            {
                Value = TransformSamples(samples);
            });

        int affected = cmd.ExecuteNonQuery();
        transaction.Commit();
    }
}

...

private static IEnumerable<SqlDataRecord> TransformSamples(
        {YourSampleType} samples)
{
    var schema = new[]
    {
        new SqlMetaData("Timestamp", SqlDbType.DateTime),
        new SqlMetaData("Timestamp", SqlDbType.VarBinary, -1),
        new SqlMetaData("IntField1", SqlDbType.Int),
        new SqlMetaData("...", SqlDbType.Int),
        new SqlMetaData("IntField20", SqlDbType.Int)
    };

    foreach (var sample in samples)
    {
        var row = new SqlDataRecord(schema);
        row.SetSqlDate(0, sample.ReceiveDate);
        row.SetSqlBinary(1, sample.Data);
        row.SetSqlInt(2, sample.Data.Length);
        row.SetSqlInt(..., ...);
        row.SetSqlInt(24, sample.IntValue19);
        yield return row;
    }
}

score 2 · Accepted Answer

我已经设法通过使用 SqlBulkInsert 来解决我的问题，正如 juharr 在上述评论之一中所建议的那样。

我主要根据这篇文章将我的数据转换为可以批量插入数据库的 DataTable：

将通用列表/可枚举转换为数据表？

感谢您的所有回答！

c# - Slow insert performance with large amounts of data (SQL Server / C#)

3 回答 3

Related

Reference