2

我正在尝试读取供应商提供的基于 HTML 的 .xls 文件并将其转换为 csv 以便导入不同的进程。我找到了大量可以读取和转换的解决方案,其中最流行的是使用 OLEDB 来读取它。我上周在 VS2010 中进行了这项工作,但后来安装了 VS2012/.NET4.5,突然它无法识别源文件我所做的一切都无法让它再次发挥作用——我什至尝试在另一台机器上安装 VS2010,但它不会运行(所以我不确定它在原始机器上是如何工作的)。如果我按原样运行代码,cnn.Open() 会抛出一个异常,指出“外部表不是预期的格式”。如果我将连接字符串更改为注释掉的行,它会读取文件但不正确(并非所有内容都被读取并且数据未正确填充)。

因此,总而言之,使用 C# 读取本文底部文件的最佳方法是什么(最好没有第三方库/应用程序)?

这是代码

string excelFilePath = @"C:\Users\Dan\test.xls";
string csvOutputFile = @"C:\Users\Dan\output.csv";
int worksheetNumber = 1;
// connection string
var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0;IMEX=1;HDR=NO\"", excelFilePath);
//var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"HTML Import;IMEX=1;HDR=NO\"", excelFilePath);

var cnn = new OleDbConnection(cnnStr);
// get schema, then data
var dt = new DataTable();
try
{
    cnn.Open();
    var schemaTable = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
    if (schemaTable.Rows.Count < worksheetNumber) throw new ArgumentException("The worksheet number provided cannot be found in the spreadsheet");
    string worksheet = schemaTable.Rows[worksheetNumber - 1]["table_name"].ToString().Replace("'", "");
    string sql = String.Format("select * from [{0}]", worksheet);
    var da = new OleDbDataAdapter(sql, cnn);
    da.Fill(dt);
 }
 catch (Exception e){}
 finally{cnn.Close();}

 // write out CSV data
 using (var wtr = new StreamWriter(csvOutputFile))
 {
     foreach (DataRow row in dt.Rows)
     {
         bool firstLine = true;
         foreach (DataColumn col in dt.Columns)
         {
             if (!firstLine) { wtr.Write(","); } else { firstLine = false; }
                 var data = row[col.ColumnName].ToString().Replace("\"", "\"\"");
                 wtr.Write(String.Format("\"{0}\"", data));
             }
             wtr.WriteLine();
          }
     }

这是我正在读取的文件,以 .xls 扩展名发送给我们。

<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<meta name="ProgId" content="Excel.Sheet"/>
<meta name="Generator" content="Microsoft Excel 10"/>
<!--[if !mso]>
<style>
v\\:* {behavior:url(#default#VML);}");
o\\:* {behavior:url(#default#VML);}");
x\\:* {behavior:url(#default#VML);}");
.shape {behavior:url(#default#VML);}");
</style>");
<![endif]-->
<!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>report</w:Name>
<x:WorksheetOptions>
<x:ProtectContents>False</w:ProtectContents>
<x:ProtectObjects>False</w:ProtectObjects>
<x:ProtectScenarios>False</w:ProtectScenarios>
</w:WorksheetOptions>
</w:ExcelWorksheet>
</w:ExcelWorksheets>
<x:ProtectStructure>False</w:ProtectStructure>
<x:ProtectWindows>False</w:ProtectWindows>
</w:ExcelWorkbook>");
</xml><![endif]-->
<head>

<style>
br {mso-data-placement:same-cell;}
</style>
</head>
<body>

<style>
table {
mso-displayed-decimal-separator:"\.";
mso-displayed-thousand-separator:"\,";
}
</style>
<table width="100%">
<tr>
<td align=center colspan=6 valign=top>
<span class="pageHead">
<nobr><h1>Status</h1></nobr></span>
</td>
</tr>
<tr>
<td align=center colspan=6 valign=top>
<span class="pageHead"><nobr>
Generated by User
</nobr></span>
</td></tr>
<tr>
<td>&nbsp;</td>
</tr>
<tr>
<td>&nbsp;</td>
</tr>
</table>
<table border="1" cellspacing="0" cellpadding="0" width="100%">
<tr>
<th>Owner</th>
<th>Project Id</th>
<th>Event Id</th>
<th>Event Title</th>
<th>Event Status</th>
<th>EventSummary</th>
</tr>
<tr>
<td>User</td>
<td>1</td>
<td>test1</td>
<td>event1</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>2</td>
<td>test2</td>
<td>event2</td>
<td>Pending Selection</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>3</td>
<td>test3</td>
<td>event3</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>4</td>
<td>test4</td>
<td>event4</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>5</td>
<td>test5</td>
<td>event5</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>6</td>
<td>test6</td>
<td>event6</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>7</td>
<td>test7</td>
<td>event7</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>8</td>
<td>test8</td>
<td>event8</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>9</td>
<td>test9</td>
<td>event9</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>10</td>
<td>test10</td>
<td>event10</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>11</td>
<td>test11</td>
<td>event11</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>12</td>
<td>test12</td>
<td>event12</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>13</td>
<td>test13</td>
<td>event13</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>14</td>
<td>test14</td>
<td>event14</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>15</td>
<td>test15</td>
<td>event15</td>
<td>Completed</td>
<td>1</td>
</tr>
</table>

</body></html>
4

1 回答 1

0

啊哈!因此,在查看了原始数据和其他一些示例之后,我意识到工作表中有两个单独的表,并且 OLEDB 驱动程序将其解释为两个单独的工作表。我将工作表变量更改为 2 并检索了我真正感兴趣的第二个数据“表”。因此,通过循环浏览所有“工作表”,我应该能够从这张表中获取所有数据.

于 2013-06-26T18:52:09.230 回答