我正在尝试读取供应商提供的基于 HTML 的 .xls 文件并将其转换为 csv 以便导入不同的进程。我找到了大量可以读取和转换的解决方案,其中最流行的是使用 OLEDB 来读取它。我上周在 VS2010 中进行了这项工作,但后来安装了 VS2012/.NET4.5,突然它无法识别源文件我所做的一切都无法让它再次发挥作用——我什至尝试在另一台机器上安装 VS2010,但它不会运行(所以我不确定它在原始机器上是如何工作的)。如果我按原样运行代码,cnn.Open() 会抛出一个异常,指出“外部表不是预期的格式”。如果我将连接字符串更改为注释掉的行,它会读取文件但不正确(并非所有内容都被读取并且数据未正确填充)。
因此,总而言之,使用 C# 读取本文底部文件的最佳方法是什么(最好没有第三方库/应用程序)?
这是代码
string excelFilePath = @"C:\Users\Dan\test.xls";
string csvOutputFile = @"C:\Users\Dan\output.csv";
int worksheetNumber = 1;
// connection string
var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"Excel 12.0;IMEX=1;HDR=NO\"", excelFilePath);
//var cnnStr = String.Format("Provider=Microsoft.ACE.OLEDB.12.0;Data Source={0};Extended Properties=\"HTML Import;IMEX=1;HDR=NO\"", excelFilePath);
var cnn = new OleDbConnection(cnnStr);
// get schema, then data
var dt = new DataTable();
try
{
cnn.Open();
var schemaTable = cnn.GetOleDbSchemaTable(OleDbSchemaGuid.Tables, null);
if (schemaTable.Rows.Count < worksheetNumber) throw new ArgumentException("The worksheet number provided cannot be found in the spreadsheet");
string worksheet = schemaTable.Rows[worksheetNumber - 1]["table_name"].ToString().Replace("'", "");
string sql = String.Format("select * from [{0}]", worksheet);
var da = new OleDbDataAdapter(sql, cnn);
da.Fill(dt);
}
catch (Exception e){}
finally{cnn.Close();}
// write out CSV data
using (var wtr = new StreamWriter(csvOutputFile))
{
foreach (DataRow row in dt.Rows)
{
bool firstLine = true;
foreach (DataColumn col in dt.Columns)
{
if (!firstLine) { wtr.Write(","); } else { firstLine = false; }
var data = row[col.ColumnName].ToString().Replace("\"", "\"\"");
wtr.Write(String.Format("\"{0}\"", data));
}
wtr.WriteLine();
}
}
这是我正在读取的文件,以 .xls 扩展名发送给我们。
<html xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns="http://www.w3.org/TR/REC-html40"><meta content="text/html; charset=UTF-8" http-equiv="Content-Type">
<meta name="ProgId" content="Excel.Sheet"/>
<meta name="Generator" content="Microsoft Excel 10"/>
<!--[if !mso]>
<style>
v\\:* {behavior:url(#default#VML);}");
o\\:* {behavior:url(#default#VML);}");
x\\:* {behavior:url(#default#VML);}");
.shape {behavior:url(#default#VML);}");
</style>");
<![endif]-->
<!--[if gte mso 9]><xml>
<x:ExcelWorkbook>
<x:ExcelWorksheets>
<x:ExcelWorksheet>
<x:Name>report</w:Name>
<x:WorksheetOptions>
<x:ProtectContents>False</w:ProtectContents>
<x:ProtectObjects>False</w:ProtectObjects>
<x:ProtectScenarios>False</w:ProtectScenarios>
</w:WorksheetOptions>
</w:ExcelWorksheet>
</w:ExcelWorksheets>
<x:ProtectStructure>False</w:ProtectStructure>
<x:ProtectWindows>False</w:ProtectWindows>
</w:ExcelWorkbook>");
</xml><![endif]-->
<head>
<style>
br {mso-data-placement:same-cell;}
</style>
</head>
<body>
<style>
table {
mso-displayed-decimal-separator:"\.";
mso-displayed-thousand-separator:"\,";
}
</style>
<table width="100%">
<tr>
<td align=center colspan=6 valign=top>
<span class="pageHead">
<nobr><h1>Status</h1></nobr></span>
</td>
</tr>
<tr>
<td align=center colspan=6 valign=top>
<span class="pageHead"><nobr>
Generated by User
</nobr></span>
</td></tr>
<tr>
<td> </td>
</tr>
<tr>
<td> </td>
</tr>
</table>
<table border="1" cellspacing="0" cellpadding="0" width="100%">
<tr>
<th>Owner</th>
<th>Project Id</th>
<th>Event Id</th>
<th>Event Title</th>
<th>Event Status</th>
<th>EventSummary</th>
</tr>
<tr>
<td>User</td>
<td>1</td>
<td>test1</td>
<td>event1</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>2</td>
<td>test2</td>
<td>event2</td>
<td>Pending Selection</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>3</td>
<td>test3</td>
<td>event3</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>4</td>
<td>test4</td>
<td>event4</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>5</td>
<td>test5</td>
<td>event5</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>6</td>
<td>test6</td>
<td>event6</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>7</td>
<td>test7</td>
<td>event7</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>8</td>
<td>test8</td>
<td>event8</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>9</td>
<td>test9</td>
<td>event9</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>10</td>
<td>test10</td>
<td>event10</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>11</td>
<td>test11</td>
<td>event11</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>12</td>
<td>test12</td>
<td>event12</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>13</td>
<td>test13</td>
<td>event13</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>14</td>
<td>test14</td>
<td>event14</td>
<td>Completed</td>
<td>1</td>
</tr>
<tr>
<td>User</td>
<td>15</td>
<td>test15</td>
<td>event15</td>
<td>Completed</td>
<td>1</td>
</tr>
</table>
</body></html>