c# - 在 HTML 表格的特定行中获取单元格

Question

我正在开发一个 C# 控制台应用程序。最终目标是在表格中找到特定行，然后单击链接以下载由旧 Web 应用程序生成的文件。（这已经很老了，所以没有可供我使用的 API）

该表遵循如下结构：

<html>
<head>
    <title>Test Table Page</title>
</head>
<body>
    <table border="1" cellpadding="3" cellspacing="5">
        <tr>
            <td>Test Row One</td>
            <td>Test Content</td>
        </tr>
        <tr>
            <td>Test Row Two</td>
            <td>Test Content</td>
        </tr>
        <tr>
            <td>Test Row Three</td>
            <td>Test Content</td>
        </tr>
    </table>
</body>

我想要做的是获取与Test Row Two关联的测试内容。我需要使用相邻单元格中的报告名称。

score 1 · Accepted Answer

如果您认为 HTML 将与 XML 兼容，则可以使用如下所示的 XML 解析器（使用 XPath）。就个人而言，我喜欢避免使用 HTML 解析器，因为它们又大又复杂。就像使用电锯将树枝折成两半一样。有时，别无他法，但如果有更简单的解决方案，请先尝试。

相关代码片段：

var l_contentCell = l_navigator.SelectSingleNode( "//td[preceding-sibling::td/text()='Test Row Two']" );

完整源代码：

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using System.Xml.Linq;
using System.Xml.XPath;

namespace XmlSandbox {
    class Program {
        static void Main( string[] args ) {

            string l_xmlLiteral =
                "<html>\n" +
                "   <head>\n" +
                "       <title>Test Table Page</title>\n" +
                "   </head>\n" +
                "   <body>\n" +
                "       <table border=\"1\" cellpadding=\"3\" cellspacing=\"5\">\n" +
                "           <tr>\n" +
                "               <td>Test Row One</td>\n" +
                "               <td>Test Content</td>\n" +
                "           </tr>\n" +
                "           <tr>\n" +
                "               <td>Test Row Two</td>\n" +
                "               <td>Test Content</td>\n" +
                "           </tr>\n" +
                "           <tr>\n" +
                "               <td>Test Row Three</td>\n" +
                "               <td>Test Content</td>\n" +
                "           </tr>\n" +
                "       </table>\n" +
                "   </body>\n" +
                "</html>";

            var l_document = XDocument.Parse( l_xmlLiteral );
            var l_navigator = l_document.CreateNavigator();

            var l_contentCell = l_navigator.SelectSingleNode( "//td[preceding-sibling::td/text()='Test Row Two']" );

            Console.WriteLine( l_contentCell.Value );

        }
    }
}

c# - 在 HTML 表格的特定行中获取单元格

1 回答 1

Related

Reference