0

我有一个奇怪的问题,我对此束手无策。或许一双新的眼睛就能指出问题所在!

我正在使用 jSoup 解析 HTML 文件问题是表集被输出到文件 3-4 次,即使被写入新文件也是如此。第一次输出为 .csv 文件中的一条直线,但每隔一次,它的格式都完全符合我的要求。但我显然希望第一次就正确并拥有它,以便第一次有机会!

我的代码:

Document doc = new Document(file.toString());
    doc = Jsoup.parse(file, null);

    Elements tables = doc.select("table");

    for (Element table: tables) {
        Elements rows = table.select("tr");
        for (Element row: rows) {
            Elements cells = row.getElementsByTag("td");
            StringBuffer values = new StringBuffer();
            for (Element cell: cells) {
                String cellText = cell.text();
                cellText = cellText.replaceAll(",", "");
                cellText = cellText.replaceAll("£", ",£");
                cellText = cellText.replaceAll(",£", "£");
                System.out.println(cellText);
                values.append(cellText + ",");
            }
            System.out.println(values.toString());
            addToFile(values + ",");
        }
    }

// add new data to mySNMPResults file
private static void addToFile(String myString) { // add newest entry to .csv
                                                    // file
    try {
        BufferedWriter out = new BufferedWriter(new FileWriter(
                "MyParsedDOMTree.csv", true));
        out.write(myString + "\n");
        out.close();
    } catch (IOException e) {
        e.printStackTrace();
    }
}

它也可能只是复杂的 HTML 文件的情况,各种表格相互嵌套,但我不明白这如何导致带有仅出现一次的数字数据的表格被输出三次......

编辑

HTML片段:

<tr bgcolor = "#EEEEEE" height = 20 >
<td width = 15% >
<font face="tahoma" size="1">
Dept '<b>Food Incl Vat</b>'
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£688.95
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£642.60
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£767.95
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£3,007.00
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£1,525.60
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£1,970.40
</td>
<td width = 10% align =
right><font face="tahoma" size="1">
£353.00
</td>
<td width = 1%></td><td width
= 14% align = right bgcolor = "#DFDFDF"><font face="tahoma" size="1" color = '#444444'>
<b>£8,955.50</b></td>
</tr>
4

1 回答 1

1

编辑:抱歉,代码中有错误。现在修好了。

我真的没有足够的代码来做出可靠的猜测,但是我不确定您为什么要尝试获取表格的大小然后遍历该表格,但是 .size() 得到了很多次(我我猜3-4)。您将要查找表的根,然后在根下将是表的名称(表的类名应该相同),然后在每个表中搜索您想要查找的任何内容。也许一些代码会有所帮助:)

HTML:

    <ul class="ListOfTables">
           <li class="TABLE">
                 <span class="item">
           <li class="TABLE">
                 <span class="item">
           <li class="TABLE">
                 <span class="item">
           <li class="TABLE">
                 <span class="item">

Java代码:

public void searchForItems(Document doc)
{
    Elements tables = doc.select("li[class=TABLE]");
    for (Element table : tables)
    {

        String item;
        Elements itemsInTable = table.select("span[class=item]");
        item = itemsIntTable.text();


        //Write the item to file. Depending on what is in your table, you might
        //have to write a more complex scan. Looking for things like attributes
    }
}
于 2013-07-15T16:26:48.243 回答