c# - 为什么我不能用 htmlagilitypack 解析这个元素？

Question

我不知道如何解析以下内容：

-我试图解析的示例网页：http ://www.aliexpress.com/item/-/255859073.html

- 我想要获得的信息：“7 天”。这是位于运输表左栏中的处理时间。

- 点击“运输和付款”选项卡（位于页面下方）后，运输表变得可见。

到目前为止，我已经尝试选择具有不同 x-path 值的节点：

 HtmlAgilityPack.HtmlDocument currentHTML = new HtmlAgilityPack.HtmlDocument();
 HtmlWeb webget = new HtmlWeb();
 currentHTML = webget.Load("http://www.aliexpress.com/item/-/255859073.html");

 string processingTime = currentHTML.DocumentNode.SelectSingleNode("/html/body/div[2]/div[4]/div/div/div[2]/div/div/div[3]/div/div/div/div[2]/table/tbody/tr/td[5]").InnerText;

并且：

 string processingTime = currentHTML.DocumentNode.SelectSingleNode("//*[contains(concat( \" \", @class, \" \" ), concat( \" \", \"processing\", \" \" ))]").InnerText;

但我得到这个错误：

 System.NullReferenceException was unhandled
 Message=Object reference not set to an instance of an object.

我也试过他们的手机网站，但他们没有在那里显示这些信息。

知道为什么会发生这种情况以及我需要做什么吗？

score 1 · Accepted Answer

看起来您的 XPath 表达式不正确。无论您尝试解析的元素都可以通过使用其 Id 属性来更好地实现。我已经修改了 XPath 表达式，并且作为奖励，我添加了一个正则表达式，它允许您从文本中干净地解析天数部分。

    System.Text.RegularExpressions.Regex
        dayParseRegex = new System.Text.RegularExpressions.Regex(@"(?<days>\d)( days\))$");
    HtmlAgilityPack.HtmlDocument currentHTML = new HtmlAgilityPack.HtmlDocument();
    HtmlWeb webget = new HtmlWeb();
    currentHTML = webget.Load("http://www.aliexpress.com/item/-/255859073.html");

    //Extract node
    var handlingTimeNode = currentHTML.DocumentNode.SelectSingleNode("//*[@id=\"product-info-shipping-sub\"]");

    //Run RegEx against text
    var match = dayParseRegex.Match(handlingTimeNode.InnerText);

    //Convert the days to an integer from the resultant group
    int shippingDays = Convert.ToInt32(match.Groups["days"].Value);

谈论编码并获得报酬！现在去他妈的那个网站！

c# - 为什么我不能用 htmlagilitypack 解析这个元素？

1 回答 1

Related

Reference