regex - 用于提取具有指定属性的链接的正则表达式

Question

我正在尝试构建正则表达式以从没有rel="nofollow"的文本中提取链接。

例子：

aiusdiua asudauih <a rel="nofollow" hre="http://uashiuadha.asudh/adas>adsaag</a> uhwaida <br> asdgydug <a href="http://asdha.sda/uduih/dufhuis>aguuia</a>

谢谢！

score 2 · Accepted Answer

以下正则表达式将完成这项工作：

<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"

想要的 url 将在捕获组 #1 中。例如，在 Ruby 中它将是：

if input =~ /<a (?![^>]*?rel="nofollow")[^>]*?href="(.*?)"/
    match = $~[1]
end

既然它在否定前瞻[^>]*?中接受之前，或者其他任何东西都可以在之前。如果之后，它当然也可以。relhrefrelhrefrel

score 0 · Accepted Answer

试试这个 <(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"]([^>"]*)[^>]*?>

如果您使用的是 .net 正则表达式，那么

<(?:A|AREA)\b[^<>]*?(?!rel="nofollow")[^<>]*?href=['"](?<URL>[^>"]*)[^>]*?>

数据位于名为URL或组 1的组中

regex - 用于提取具有指定属性的链接的正则表达式

2 回答 2

Related

Reference