0

我有这样的桌子

<table>
  <tbody>
    <tr>
       <td>Header1</td>
       <td>Header2</td>
       <td>Header3</td>
       <td>Header4</td>
    </tr>
    <tr>
       <td>1</td>
       <td>2</td>
       <td>3</td>
       <td>4</td>
    </tr>
    <tr>
       <td>11</td>
       <td>22</td>
       <td>33</td>
       <td>44</td>
    </tr>

  </tbody>
</table>

我的代码是

var headersList = xmlDoc.XPathSelectElements("//table//tbody//tr").ToList();

但是 headerlist 给出了所有的 td 值:-(

现在我想知道如何循环这个表。我的预期结果如下:

第一个循环预期结果

Header1 = 1,
Header2 = 2,
Header3 = 3,
Header4 = 4,

第二循环预期结果:

Header1 = 11
Header2 = 22
Header3 = 33
Header4 = 44

任何帮助将非常感激

4

2 回答 2

1

Since you have no way to distinguish between header and body (lack of thead, tbody) you will need to determine the number of elements programmatically.

var data = @"<table>
  <tbody>
    <tr>
       <td>Header1</td>
       <td>Header2</td>
       <td>Header3</td>
       <td>Header4</td>
    </tr>
    <tr>
       <td>1</td>
       <td>2</td>
       <td>3</td>
       <td>4</td>
    </tr>
    <tr>
       <td>11</td>
       <td>22</td>
       <td>33</td>
       <td>44</td>
    </tr>

  </tbody>
</table>";

var xDoc = XDocument.Parse(data);
var headerElements = xDoc.XPathSelectElements("//table//tbody//tr");
int headerCount = headerElements.First().Descendants().Count();
var nodes = headerElements.SelectMany(x => x.Descendants())
                          .Select(x => x.Value)
                          .ToList();
var head = nodes.Take(headerCount).ToList();
var body = nodes.Skip(headerCount).ToList();

var pairs = new List<Tuple<string,string>>();

for(var i = 0; i < body.Count; i += headerCount)
{
    for(int j = 0; j < head.Count; j++)
    {
        pairs.Add(Tuple.Create(head[j], body[i+j]));
    }
}

foreach(var pair in pairs)
{
    Console.WriteLine("{0} = {1}", pair.Item1, pair.Item2);
}
于 2013-06-12T12:34:54.213 回答
0

使用HtmlAgilityPack(可从 NuGet 获得)解析 HTML 文档。这是向控制台显示表数据的示例:

var doc = new HtmlDocument();
doc.Load(path_to_html);
var rows = 
      doc.DocumentNode.SelectNodes("//table/tbody/tr")
         .Select(tr => tr.SelectNodes("td").Select(td => td.InnerHtml).ToList())
         .ToList();

输出:

var headers = rows[0];

// skip first row which contains headers
foreach (var row in rows.Skip(1))
{
    for (int i = 0; i < row.Count; i++)
        if (headers.Count > i) // you can remove this check if data is valid
            Console.WriteLine("{0} = {1}", headers[i], row[i]);
}

结果:

Header1 = 1
Header2 = 2
Header3 = 3
Header4 = 4

Header1 = 11
Header2 = 22
Header3 = 33
Header4 = 44

如果您需要为列定义标题,那么我建议您使用thead标签:

<table>
  <thead>
    <tr>
       <td>Header1</td>
       <td>Header2</td>
       <td>Header3</td>
       <td>Header4</td>
    </tr>
  </thead>
  <tbody>
    <tr>
       <td>1</td>
       <td>2</td>
       <td>3</td>
       <td>4</td>
    </tr>
    <tr>
       <td>11</td>
       <td>22</td>
       <td>33</td>
       <td>44</td>
    </tr>
  </tbody>
</table>

在这种情况下,解析和输出看起来像

var headers = doc.DocumentNode.SelectNodes("//table/thead/tr/td")
                 .Select(td => td.InnerHtml).ToList();
var rows = 
      doc.DocumentNode.SelectNodes("//table/tbody/tr")
         .Select(tr => tr.SelectNodes("td").Select(td => td.InnerHtml).ToList())
         .ToList();

foreach (var row in rows)
{
    for(int i = 0; i < row.Count; i++)
        Console.WriteLine("{0} = {1}", headers[i], row[i]);
}
于 2013-06-12T12:05:04.460 回答