2

由于内存问题,我决定将 DataTable 替换/转换为 IDataReader。

经过一段时间的谷歌和 MSDN 搜索,我在http://www.michaelbowersox.com/2011/12/22/using-a-custom-idatareader-to-stream-data-into-a-database遇到了这个/Bulk Insert Sql Server 数百万条记录

由于我使用的是 LumenWorks Fast CSV Reader,但我还没有弄清楚如何告诉 CsvReader 让 IDataReader 使用 2 个不同的字段版本。:-( csvReader.FieldCount 是这里的关键,但我不知道如何告诉 CsvReader 使用具有 IDataReader 接口的两个新类中的任何一个。请参阅下面的原始脚本和修改后的脚本......谢谢......

//原始脚本...

var dbConnection = new SqlConnection(_dbConnectionString);

using (var dbBulkCopy = new SqlBulkCopy(dbConnection)
{
   using (CsvReader csvReader = new CsvReader(new StreamReader(filePath), false, '|', '"', '\\', '#', ValueTrimmingOptions.UnquoteOnly))
   {
       while(csvReader.ReadNextRecord())
       {
           if (csvReader.FieldCount == 48)
           {
               //Version 1...
               dataRow["DealerId"] = csvReader[0];
               dataRow["DealerName"] = csvReader[1];
               //Etc...
           }
           else if (csvReader.FieldCount == 51)
           {
               //Version 2...
               dataRow["DealerId"] = csvReader[0];
               dataRow["DealerName"] = csvReader[1];
               //Etc...
           }
           else { throw new Exception("Field Total Count Mismatched"); }

           dataTable.Rows.Add(dataRow);
       }

       dbConnection.Open();

       dbBulkCopy.WriteToServer(dataTable);
   }
}

//新脚本...

 var dbConnection = new SqlConnection(_dbConnectionString);

using (var dbBulkCopy = new SqlBulkCopy(dbConnection)
{
   dbConnection.Open();

   using (CsvReader csvReader = new CsvReader(new StreamReader(filePath), false, '|', '"', '\\', '#', ValueTrimmingOptions.UnquoteOnly))
   {
       csvReader.ReadNextRecord();

       dbBulkCopy.WriteToServer(
           if (csvReader.FieldCount == 48)
           {
               //Version 1...

               csvReader....???  //Assign a custom class having IDataTable...
           }
           else if (csvReader.FieldCount == 51)
           {
               //Version 2...
               csvReader....???  //Assign a custom class having IDataTable...
           }
           else { throw new Exception("Field Total Count Mismatched"); }
        );
   }
}

//示例脚本...

using (var file = new StreamReader(path))
using (var csv = new CsvReader(file, true)) // true = has header row
using (var bcp = new SqlBulkCopy(connection)) {
    bcp.DestinationTableName = "TableName";
    bcp.WriteToServer(csv);
}
4

1 回答 1

1

Since, it'll be a bit long, I am writing it as an answer.

*I am assuming that despite the fact that you have two kinds of csv file coming with different field orders, the target table is the same.* [EDIT] No need to assume , you indicated in your comment.

To be able to to understand your context, I got some sample data from here.

Let's say that first type of file is something like the following :

Rk,Year,Age,Tm,Lg,Id,G,W,L,W-L,Finish,
1,1978,37,Atlanta Braves,NL,,162,69,93,.426,6

And the second type like (some columns are reversed Age <-> Finish and there are additional fields)

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3
1,1978,Very good year,Atlanta Braves,NL,,162,69,93,.426,96,,,,

So the target table would look something like (only columns)

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3

I see two options (+1 option at the end) here :

OPTION 1

  1. Add Step 0 to make all the input files uniform on field level by defining a field format. This can be done by creating the same fields that you have on the database.

[Let's imagine that Test4 and Test5 are columns that exists on target table but missing in both CSV files]

Rk,Year,Finish,Tm,Lg,Id,G,W,L,W-L,Age,Test1,Test2,Test3,Test4,Test5
  1. Parse all the files you have(both types) rewrite them onto one(or multiple that's up to you) respecting the format you defined. This way, you have only 1 (or multiple) file with a unique format.

  2. You can parse now this file to insert it into database using csv reader since field incompatibility problem is handled with the last file you obtained in unique format.

OPTION 2

You will do SqlBulkCopy operation twice. First round you'll read files having 48 fields and next round you'll read files having 51 fields.

            var FilesWith48Fields = Directory.GetFiles(@"D:\Data\48Fields", "*.csv");

            foreach (var fileName in FilesWith48Fields)
            {
                using (var file = new StreamReader(fileName))
                using (var csv = new CsvReader(file, true)) // true = has header row
                using (var bcp = new SqlBulkCopy(connection, SqlBulkCopyOptions.KeepNulls))
                {
                    bcp.DestinationTableName = "fletchsodTable";
                    // map the first field of your CSV to the column in Database
                    var mapping1 = new SqlBulkCopyColumnMapping(0, "FirstColumnName");
                    bcp.ColumnMappings.Add(mapping1);

                    var mapping2 = new SqlBulkCopyColumnMapping(1, "SecondColumnName");
                    bcp.ColumnMappings.Add(mapping2);  
                    ....

                    bcp.WriteToServer(csv);
                }
            }

And Repeat the same with files having 51 fields

var FilesWith51Fields = Directory.GetFiles(@"D:\Data\51Fields", "*.csv");
......

More information on SqlBulkCopyColumnMapping can be found here.

OPTION 3

In case you want to adventure in creating your Data Reader, here are some links :

Daniel Wertheim's blog

A sample implentation on codeproject

Another one

And finally MSDN

PERSONAL NOTE Due lack of time, for a similar problem of mine I gave up 3rd option because with all the unit tests and optimization that you'll have to do and another tweaks, this can take time(at least for me, it was the case)

OPTION 4 Perhaps, with the column mapping I pointed in OPTION 2 , you would want to implement your way by testing field counts. But instinctively, I would suggest not to count fields by hard coded integers.

于 2013-07-24T00:09:58.167 回答