c# - 使用 HtmlAgilityPack 解析 rel 规范

Question

如何从 html 文档中解析带有 URL 的 rel="canonical" 标记？

我想在这里找到网址：

<link rel="canonical" href="http://stackoverflow.com/questions/2593147/html-agility-pack-make-code-look-neat" />

score 4 · Accepted Answer

假设doc是你的HtmlDocument对象。

HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//link[@rel]");

应该为您link提供具有rel属性的元素。现在迭代：

foreach (HtmlNode link in links)
{
    string url;
    if (link.Attributes["rel"] == "canonical") {
        url = link.Attributes["href"];
    }
}

此外，可以过滤 SelectNodes 调用中的链接，以仅获取具有“规范”的链接：doc.DocumentNode.SelectNodes("//link[@rel='canonical']");

未经测试的代码，但你明白了:)

score 3 · Accepted Answer

接受的答案不再正确，更新的代码如下：

var links = htmlDoc.DocumentNode.SelectNodes("//link[@rel]");

string canonical;

foreach (HtmlNode link in links)
{
    if (link.Attributes["rel"].Value == "canonical")
    {
        canonical = link.Attributes["href"].Value;
    }
}

score 0 · Accepted Answer

HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(_html);

String link = (from x in doc.DocumentNode.Descendants()
           where x.Name == "link"
           && x.Attributes["rel"] != null
           && x.Attributes["rel"].Value == "canonical"
           && x.Attributes["href"] != null
           select x.Attributes["href"].Value).FirstOrDefault();

c# - 使用 HtmlAgilityPack 解析 rel 规范

3 回答 3

Related

Reference