c# - 喜欢声明或删除 html 敏捷包中的尾随空格？

Question

我正在尝试将数据从网站下载到数据表中。问题是我无法访问正确的节点，因为似乎有空格。到目前为止，这是我的代码：

        public static DataTable downloadtable()
    {
        DataTable dt = new DataTable();
        string htmlCode = "";
        using (WebClient client = new WebClient())
        {
            client.Headers.Add(HttpRequestHeader.UserAgent, "AvoidError");
            htmlCode = client.DownloadString("https://www.eex.com/en/Market%20Data/Trading%20Data/Power/Hour%20Contracts%20%7C%20Spot%20Hourly%20Auction/Area%20Prices/spot-hours-area-table/2013-08-22");
        }
        //this is just to check the file structure from text file
        System.IO.StreamWriter file = new System.IO.StreamWriter("c:\\temp\\test.txt");
        file.WriteLine(htmlCode);

        HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();

        doc.LoadHtml(htmlCode);

        dt = new DataTable();

        foreach (HtmlNode table in doc.DocumentNode.SelectNodes("//table[@class='list electricity']/tr/th[@class='title'][.='Market Area']"))
        {
            //This is the problem name where I get the error
            foreach (HtmlNode row in table.SelectNodes("//td[@class='title'][.='            00-01          ']"))
            {

                        foreach (var cell in row.SelectNodes("//td"))
                        {
                                //this is to check for correct result, final result would be to dump it into datatable
                                Console.WriteLine(cell.InnerText);                             
                        }
            }
        }
        return dt;
    }

我正在尝试从代码中的链接下载小时价格，但由于尾随空白（我认为），它似乎失败了。节点名称是否有类似的语句？或者你可以删除尾随空格吗？

score 1 · Accepted Answer

我相信您的问题是您试图从显然没有更多 ' 的节点td内部检索 ' 。tdtd

<tr>
 <td class="title">         00-01           </td>
 <td class="spacer"></td>
 <td class="r">€/MWh</td>
 <td class="spacer"></td>
 <td>35.34</td>
 <td class="spacer"></td>
 <td>34.02</td>
 <td class="spacer"></td>
 <td>34.02</td>
</tr>

因此，如果您尝试使用结果进行迭代，table.SelectNodes("//td[@class='title'][.=' 00-01 ']")它将不包含任何 td。

如果你想要从 00-01 开始的所有行，你可以使用这个：

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]/ancestor::table"))
{
    foreach (var cell in row.SelectNodes("./tr/td"))
    {
        if (string.IsNullOrEmpty(cell.InnerText.Trim()))
            continue;
        Console.WriteLine(cell.InnerText.Trim());
    }
}

如果你只想要 00-01 行，你可以使用这个：

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc.DocumentNode.SelectNodes("//td[@class='title']"))
{
    if (row.InnerText.Trim() == "00-01")
    {
        foreach (var cell in row.ParentNode.ChildNodes)
        {
            if (string.IsNullOrEmpty(cell.InnerText.Trim()))
                continue;
            Console.WriteLine(cell.InnerText.Trim());
        }
    }
}

或者您可以将其用作：

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(htmlCode);
foreach (HtmlNode row in doc2.DocumentNode.SelectNodes("//td[@class='title'][(normalize-space(.)='00-01')]"))
{
    foreach (var cell in row.ParentNode.ChildNodes)
    {
        if (string.IsNullOrEmpty(cell.InnerText.Trim()))
            continue;
        Console.WriteLine(cell.InnerText.Trim());
    }
}

c# - 喜欢声明或删除 html 敏捷包中的尾随空格？

1 回答 1

Related

Reference