0

我正在尝试解析以下信息div class="base shortstory

 <div id="dle-content">
   <div class="base shortstory">
     <h3 class="btl"><a href="http://someurl.com/htc-jetstream.html">HTC Jetstream</a></h3>
   </div>
   <div class="base shortstory">
     <h3 class="btl"><a href="http://someurl.com/samsung.html">Samsung S4</a></h3>
   </div>
   <div class="base shortstory">
     <h3 class="btl"><a href="http://someurl.com/dell.html">Dell Streak</a></h3>
   </div>
 </div> 

这是代码

        const string url = "http://someurl.com/catalogue";
        const string rootUrl = "http://someurl.com";
        HtmlWeb hw = new HtmlWeb();
        HtmlDocument doc = hw.Load(url);
        int dealsCount = 0;
        HtmlNode root = doc.DocumentNode.SelectSingleNode("//div[@id='dle-content']");
        int i = 1;
        //this is for the default page
        while (i<=10)
        {
            try
            {
                string node= String.Format("//div[{0}]", i);
                var link =
                    doc.DocumentNode.SelectSingleNode(node);
                var href = link.SelectSingleNode("//div[@class='mlink']//span[@class='argmore']//a[@href]").Attributes["href"].Value;
                string title = link.SelectSingleNode("//h3[@class='btl']//a[@href]").InnerText.Trim();

                string description = link.SelectSingleNode("//div[@class='maincont']//div[1]").InnerText.Replace("\n", " ").Replace("\r", "").Replace("\t", "").Trim();
                description = RemoveHTMLComments(description);

                var imageURL = link.SelectSingleNode("//div[@class='maincont']//div[1]//a//img").Attributes["src"].Value;

                var price = link.SelectSingleNode("//div[@class='mlink']//span[3]//font").InnerText.Trim();
                price = Regex.Match(price, @"\d+").Value;

                var partnerdealID = href;

                //no information 

                var isActivesStr = link.SelectSingleNode("//div[@class='mlink']//span[2]/font").InnerText.Trim();
                bool isActive;
                if (isActivesStr.Contains("Нет в наличии"))
                {
                    isActive = false;
                }
                else
                {
                    isActive = true;
                }
                var dealUrl = href; //requires login - show the page itself

            }
            catch (Exception)
            {
            }
            i += 1;
        }

但是在循环之后,选定的节点仍然是第一个。我究竟做错了什么?

4

1 回答 1

2

您所有的 XPATH 表达式都以 '//' 开头,这意味着“从文档的根目录开始并递归搜索”。所以当你这样做时:

link.SelectSingleNode("//div[@class='mlink']//span[@class='argmore']//a[@href]")

您将不是从 开始link,而是从文档的根目录开始。您可能想要这样做:

link.SelectSingleNode("div[@class='mlink']...etc...")

这相当于

link.SelectSingleNode("./div[@class='mlink']...etc...")

'。' 表示当前节点。'/' 表示只搜索直接子节点,而不是递归搜索。

于 2013-04-16T08:09:56.490 回答