0

I am stuck with idea on creating proper CSV from an html table. I am using HTMLAgilityPack to read the html from string and create a HTMLDocument. Then I am using XPATH to loop through rows and columns.

The problem is that I am unable to determine the correct row and cell(x,y) for a particular cell.

Example HTML:

<html>
<body>
    <table border="1">
        <tr>
            <td rowspan="2">
                100
            </td>
            <td>
                200
            </td>
            <td colspan="2">
                300
            </td>
        </tr>
        <tr>
            <td colspan="2">
                400
            </td>
            <td>
                600
            </td>
        </tr>
        <tr>
            <td>
                400
            </td>
            <td>
                500
            </td>
            <td>
                600
            </td>
        </tr>
    </table>
</body>
</html>

Image of Table

When I open it in excel and save as CSV, I do get the desired output, which is:

100,200,300,
,400,,600
400,500,600,

Can someone help me create the same output in .Net respecting the rowpan and colspan?

Thanks! Dex

4

1 回答 1

2

你不需要知道你在哪一行哪一列。您需要做的就是为您找到的每个新列添加一个“,”,并在每次到达行尾时添加一个断线。

如果您浏览文档,认为它是一个 xml 文档,您所要做的就是遍历所有 TR 节点,当您到达子节点列表的末尾时添加一个断线。并遍历每个 TR 节点上的所有 TD 节点,必要时添加“,”。

于 2011-05-18T17:59:10.277 回答