我有一个奇怪的问题,我对此束手无策。或许一双新的眼睛就能指出问题所在!
我正在使用 jSoup 解析 HTML 文件问题是表集被输出到文件 3-4 次,即使被写入新文件也是如此。第一次输出为 .csv 文件中的一条直线,但每隔一次,它的格式都完全符合我的要求。但我显然希望第一次就正确并拥有它,以便第一次有机会!
我的代码:
Document doc = new Document(file.toString());
doc = Jsoup.parse(file, null);
Elements tables = doc.select("table");
for (Element table: tables) {
Elements rows = table.select("tr");
for (Element row: rows) {
Elements cells = row.getElementsByTag("td");
StringBuffer values = new StringBuffer();
for (Element cell: cells) {
String cellText = cell.text();
cellText = cellText.replaceAll(",", "");
cellText = cellText.replaceAll("£", ",£");
cellText = cellText.replaceAll(",£", "£");
System.out.println(cellText);
values.append(cellText + ",");
}
System.out.println(values.toString());
addToFile(values + ",");
}
}
// add new data to mySNMPResults file
private static void addToFile(String myString) { // add newest entry to .csv
// file
try {
BufferedWriter out = new BufferedWriter(new FileWriter(
"MyParsedDOMTree.csv", true));
out.write(myString + "\n");
out.close();
} catch (IOException e) {
e.printStackTrace();
}
}
它也可能只是复杂的 HTML 文件的情况,各种表格相互嵌套,但我不明白这如何导致带有仅出现一次的数字数据的表格被输出三次......
编辑
HTML片段:
<tr bgcolor = "#EEEEEE" height = 20 >
<td width = 15% >
<font face="tahoma" size="1">
Dept '<b>Food Incl Vat</b>'
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£688.95
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£642.60
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£767.95
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£3,007.00
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£1,525.60
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£1,970.40
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£353.00
</td>
<td width = 1%></td><td width
= 14% align = right bgcolor = "#DFDFDF"><font face="tahoma" size="1" color = '#444444'>
<b>£8,955.50</b></td>
</tr>