1
string content=" 
        <br /><br /><a href="need to replace this url">Cooking School</a><br /><br /><a href="http://www.sdlm.com">Feed your senses</a><br /><br /><a href="http://www.sdl.com">Take your cooking skills to the next level. Find a cooking school near you!</a><br /><br /><a href="http:google.com"><img src="http://www.sdlm1.com/autd3umrl_u_t.jpg" /></a>
     "

我需要用不同的 url 替换所有锚标签 href 值我使用了以下函数但它出现错误

 public List<string> GetLinksFromHtml(string content)
        {
            string regex = @"<(?<Tag_Name>(a)|img)\b[^>]*?\b(?<URL_Type>(?(1)href|src))\s*=\s*(?:""(?<URL>(?:\\""|[^""])*)""|'(?<URL>(?:\\'|[^'])*)'))";
            var matches = Regex.Matches(content, regex, RegexOptions.IgnoreCase | RegexOptions.Singleline);
            var links = new List<string>();

            foreach (Match item in matches)
            {
                string link = item.Groups[1].Value;
                links.Add(link);
            }

            return links;
        }

谢谢你的帮助

4

1 回答 1

9

尝试使用正则表达式解析 html 不是一个好主意。看到这个帖子。使用真正的 html 解析器,例如HtmlAgilityPack

HtmlAgilityPack.HtmlDocument doc = new HtmlAgilityPack.HtmlDocument();
doc.LoadHtml(content);
foreach (var a in doc.DocumentNode.Descendants("a"))
{
    a.Attributes["href"].Value = "http://a.com?url=" + HttpUtility.UrlEncode(a.Attributes["href"].Value);
}

var newContent = doc.DocumentNode.OuterHtml;
于 2012-10-16T10:34:12.890 回答