c# - 如何检测 HTML 网页中的所有相对 URL？

Question

如问题所述；如果它们是相对的，是否有某种方法可以检测 PHP 页面中的所有 URL。当然，考虑到 PHP 页面中包含的 URL 可能会出现不同的行为：

<link rel="stylesheet" href="/lib/css/hanv2/ie.css" />
<img src="/image.jpg">
<div style="background-image: url(/lib/data/emotion-header-v2/int-algemeen08.jpg)"></div>

所以我需要获取相对 URL ，不管它的 bihavior 是什么css link，，，，js linkimage linkswf link

我为此使用了 AgilityPack，这里有一些 C# 代码片段，我用来检测链接并检查它们是否是相对的：

      // to extract all a href tags
 private List<string> ExtractAllAHrefTags(HtmlAgilityPack.HtmlDocument htmlSnippet)
    {
        List<string> hrefTags = new List<string>();

        foreach (HtmlNode link in htmlSnippet.DocumentNode.SelectNodes("//link[@href]"))
        {
            HtmlAttribute att = link.Attributes["href"];
            hrefTags.Add(att.Value);
        }

        return hrefTags;
    }


    // to extract all img src tags
    private List<string> ExtractAllImgTags(HtmlAgilityPack.HtmlDocument htmlSnippet)
    {
        List<string> hrefTags = new List<string>();

        foreach (HtmlNode link in htmlSnippet.DocumentNode.SelectNodes("//img[@src]"))
        {
            HtmlAttribute att = link.Attributes["src"];
            hrefTags.Add(att.Value);
        }

        return hrefTags;
    }




       //to check whether path is relative       
            foreach (string s in AllHrefTags)
            {                  
                if (!s.StartsWith("http://") || !s.StartsWith("https://"))
                {
                    // path is not relative
                }
            }

我想知道是否有一种更好或更准确的方法可以使用AgilityPack或其他方式从给定 HTML 页面获取所有相对路径

score 2 · Accepted Answer

您可以使用此 xpath 表达式从 html 页面中提取相对 url，它们是 href 或 src 值：

htmlSnippet.DocumentNode.SelectNodes("(//@src|//@href)[not(starts-with(.,'http://'))][not(starts-with(.,'https://'))]");

您可能想要过滤以 # 开头的链接，这些链接用于跳转到当前页面上的特定位置，（例如：<a href="#tips">）：

    htmlSnippet.DocumentNode.SelectNodes("(//@src|//@href)[not(starts-with(.,'http://'))][not(starts-with(.,'https://'))][not(starts-with(.,'#'))]");

c# - 如何检测 HTML 网页中的所有相对 URL？

1 回答 1

Related

Reference