regex - 正则表达式 Lookbehind 断言 - 匹配链接锚文本

Question

我有像这样的链接

<a href="#" class="social google">Google</a>
<a href="#" class="social yahoo">Yahoo</a>
<a href="#" class="social facebook">Facebook</a>

现在我想match only anchor text使用正则表达式。
我的意思是它应该只匹配第一个链接中的 Text Google。

我试过这段代码。

(?<=<a href="#" class="social .+?">).+?(?=</a>)

但它没有按预期工作。

谁能给我正确的语法？

score 1 · Accepted Answer

我建议不要使用后视和前瞻来排除您不想要的部分，而是使用捕获组来仅获取您想要的部分：

<a href="#" class="social .+?">(.+?)</a>

从概念上讲，环视用于重叠匹配。看来您在这里不需要它们的功能。

（当然，通常的警告适用）

更新：这不仅是最佳实践的问题。使用look-behind 的正则表达式实际上会产生不正确的结果，因为它允许look-behind 部分与其他匹配项重叠。考虑这个输入：

<a href="#" class="social google">Google</a>

...

<a class="bad">foo</a>

您的正则表达式不仅匹配“Google”；它也将匹配“foo”，因为.+?应该只匹配部分类字符串可以一直扩展到文本中的另一个链接。

score 0 · Accepted Answer

试试这个

  "~<a(>| .*?>)(.*?)</a>~si"

或者

   "/<a(>| .*?>)(.*?)</a>/"

php示例

  $notecomments ='<a id="234" class="asf">fdgsd</a> <a>fdgsd</a>';

  $output=preg_replace_callback(array("~<a(>| .*?>)(.*?)</a>~si"),function($matches){
       print_r($matches[2]);
       return '';
   },' '.$notecomments.' ');

这给了你所有的锚文本

而这个只返回 class="social"

  "#<a .*?class=\".*?social.*?\".*?>(.*?)</a>#"

样本

  $notecomments ='<a id="234" class="fas social ads">fdgsd</a> <a>fdgsd</a>';

  $output=preg_replace_callback(array("#<a .*?class=\".*?social.*?\".*?>(.*?)</a>#"),function($matches){

     print_r($matches);
 return '';},' '.$notecomments.' ');

score 0 · Accepted Answer

您可能得到了正确的结果，但是因为您有其他匹配组 (?...)，您匹配的也包含您不想要的数据。

您可以尝试使用不匹配的组 (?:...) 并将您希望在匹配中显示的内容放在组本身 (.+?)

score 0 · Accepted Answer

试试这个正则表达式：

\<a .*?\>(.*?)\<\/a\>

编辑 1 - 此正则表达式匹配具有 css 类“社交”的锚点：

\<a .*?class=".*?\bsocial\b.*?\>(.*?)\<\/a\>

regex - 正则表达式 Lookbehind 断言 - 匹配链接锚文本

4 回答 4

Related

Reference