5

当我使用 OLEDB 时,从 Excel 表中读取 3200 行只需要 2 - 3 秒。现在我更改为 OpenXML 格式,现在从 Excel 表中读取 3200 行需要超过 1 分钟。

下面是我的代码:

public static DataTable ReadExcelFileDOM(string filename) 
{ 
    DataTable table; 

    using (SpreadsheetDocument myDoc = SpreadsheetDocument.Open(filename, true)) 
    { 
        WorkbookPart workbookPart = myDoc.WorkbookPart; 
        Sheet worksheet = workbookPart.Workbook.Descendants<Sheet>().First(); 
        WorksheetPart worksheetPart = 
         (WorksheetPart)(workbookPart.GetPartById(worksheet.Id)); 
        SheetData sheetData = 
            worksheetPart.Worksheet.Elements<SheetData>().First(); 
        List<List<string>> totalRows = new List<List<string>>(); 
        int maxCol = 0; 

        foreach (Row r in sheetData.Elements<Row>()) 
        { 
            // Add the empty row. 
            string value = null; 
            while (totalRows.Count < r.RowIndex - 1) 
            { 
                List<string> emptyRowValues = new List<string>(); 
                for (int i = 0; i < maxCol; i++) 
                { 
                    emptyRowValues.Add(""); 
                } 
                totalRows.Add(emptyRowValues); 
            } 


            List<string> tempRowValues = new List<string>(); 
            foreach (Cell c in r.Elements<Cell>()) 
            { 
                #region get the cell value of c. 
                if (c != null) 
                { 
                    value = c.InnerText; 

                    // If the cell represents a numeric value, you are done.  
                    // For dates, this code returns the serialized value that  
                    // represents the date. The code handles strings and Booleans 
                    // individually. For shared strings, the code looks up the  
                    // corresponding value in the shared string table. For Booleans,  
                    // the code converts the value into the words TRUE or FALSE. 
                    if (c.DataType != null) 
                    { 
                        switch (c.DataType.Value) 
                        { 
                            case CellValues.SharedString: 
                                // For shared strings, look up the value in the shared  
                                // strings table. 
                                var stringTable = workbookPart. 
                                    GetPartsOfType<SharedStringTablePart>().FirstOrDefault(); 

                                // If the shared string table is missing, something is  
                                // wrong. Return the index that you found in the cell. 
                                // Otherwise, look up the correct text in the table. 
                                if (stringTable != null) 
                                { 
                                    value = stringTable.SharedStringTable. 
                                        ElementAt(int.Parse(value)).InnerText; 
                                } 
                                break; 

                            case CellValues.Boolean: 
                                switch (value) 
                                { 
                                    case "0": 
                                        value = "FALSE"; 
                                        break; 
                                    default: 
                                        value = "TRUE"; 
                                        break; 
                                } 
                                break; 
                        } 
                    } 

                    Console.Write(value + "  "); 
                } 
                #endregion 

                // Add the cell to the row list. 
                int i = Convert.ToInt32(c.CellReference.ToString().ToCharArray().First() - 'A'); 

                // Add the blank cell in the row. 
                while (tempRowValues.Count < i) 
                { 
                    tempRowValues.Add(""); 
                } 
                tempRowValues.Add(value); 
            } 

            // add the row to the totalRows. 
            maxCol = processList(tempRowValues, totalRows, maxCol); 

            Console.WriteLine(); 
        } 

        table = ConvertListListStringToDataTable(totalRows, maxCol); 
    } 
    return table; 
} 

/// <summary> 
/// Add each row to the totalRows. 
/// </summary> 
/// <param name="tempRows"></param> 
/// <param name="totalRows"></param> 
/// <param name="MaxCol">the max column number in rows of the totalRows</param> 
/// <returns></returns> 
private static int processList(List<string> tempRows, List<List<string>> totalRows, int MaxCol) 
{ 
    if (tempRows.Count > MaxCol) 
    { 
        MaxCol = tempRows.Count; 
    } 

    totalRows.Add(tempRows); 
    return MaxCol; 
} 

private static DataTable ConvertListListStringToDataTable(List<List<string>> totalRows, int maxCol) 
{ 
    DataTable table = new DataTable(); 
    for (int i = 0; i < maxCol; i++) 
    { 
        table.Columns.Add(); 
    } 
    foreach (List<string> row in totalRows) 
    { 
        while (row.Count < maxCol) 
        { 
            row.Add(""); 
        } 
        table.Rows.Add(row.ToArray()); 
    } 
    return table; 
} 

有没有一种有效的方法可以在某处更改此代码,以便读取过程可以更快一点。我如何将其更改为代码以更快地阅读。谢谢。

4

2 回答 2

1

我尝试了您的代码,并注意到在一个非常简单的示例中,我大约需要 4 秒才能完成。

在将 my 编辑.xls file到您给定的详细信息(列:区域前缀、城市、日期、功能......)并添加大约3,600 行之后,您的代码大约需要 10 secs

我认为您应该删除任何 Console.WriteLine 语句,因为这些语句会减慢处理您的xls file. 删除所有这些后,我的秒表显示相同行数的1.26 秒。

即使在 SO 上,您也可以找到 console.WriteLine 如此缓慢的一些原因: Console.WriteLine slow。在这个问题中,有一个答案指向OutputDebugString...

于 2012-10-09T14:12:03.987 回答
1

我在您的代码中发现了一些缺点。

  1. 当添加到 DataTable 大量行时,使用 BeginLoadData 和 EndLoadData
  2. 你需要缓存 SharedStringTable
  3. 您应该使用 OpenXmlReader(SAX 方法)。内存消耗会减少。

你可以试试我的 ExcelDataReader 没有这些缺点。见这里https://github.com/gSerP1983/OpenXml.Excel.Data

读取数据表示例:

class Program
{
    static void Main(string[] args)
    {
        var dt = new DataTable();
        using (var reader = new ExcelDataReader(@"data.xlsx"))
        {                
            dt.Load(reader);
        }

        Console.WriteLine("done: " + dt.Rows.Count);
        Console.ReadKey();
   }
}
于 2016-01-17T11:34:21.043 回答