c# - 循环遍历元素并定义行/列

Question

我有一个使用 HTMLAgilityPack 从表中收集的 HTML<td>元素的 HtmlNodeCollection。通常，我只会选择<tr>表中的元素并循环遍历<td>元素，但不幸的是，<tr>开始标记是通过 JavaScript 生成的，而不是从服务器呈现的。我无法控制 HTML 的呈现方式。因此，我求助于从这个 XPATH 查询中获取 HtmlNodeCollection：

HtmlNode table = htmlDoc.DocumentNode.SelectSingleNode("//table[@width='100%' and @cellpadding='1' and @cellspacing='1' and @border='0']");
HtmlNodeCollection tds = table.SelectNodes(".//td[@align and string-length(@width)=0]"); // only select td elements that have the align attribute and don't have a width attribute

在表中，有六列和任意数量的行。我想处理每一行并将列解析为中间数据结构。我有这个代码来获取每个“行”和“列”，但它并不完全正确：

int cols = 6; // six columns
int rows = tds.Count / cols;

// loop through the rows
for (int row = 1; row <= rows; row++)
{
    for (int col = 0; col < cols; col++)
    {
        HtmlNode td = tds[col * row]; // get the associated td element from the column index * row index
        MessageBox.Show(td.InnerHtml + "\n" + td.InnerText);
    }
}

我从第 1 行而不是第 0 行开始，并以行数结束，因为我不想将零乘以六次。我试图将其视为一个矩阵，但我无法定义一行何时结束，下一行何时开始。您对如何正确循环所有行和列有任何建议吗？

score 0 · Accepted Answer

在纸上画出一个网格后，我很清楚我错过了什么。我需要将列索引添加到列数乘以当前行，如下所示：

for (int row = 0; row < rows; row++)
{
    for (int col = 0; col < cols; col++)
    {
        HtmlNode td = tds[col + cols * row]; // get the associated td element from the column index * row index
        MessageBox.Show(td.InnerHtml + "\n" + td.InnerText);
    }
}

c# - 循环遍历元素并定义行/列

1 回答 1

Related

Reference