regex - 仅当 rel= 时才从标签 a href 中提取正则表达式

Question

请仅在 rel="external nofollow" 时帮助使用正则表达式从标签中提取 href

<a href="text.html" rel="external nofollow">text1:text2:text3/</a>

只需要作为结果得到

text1:text2:text3

然后尝试

$regexp = '<a (?![^>]*?rel="external nofollow")[^>]*?href="(.*?)"';

我收到错误

Warning: preg_match() [function.preg-match]: Unknown modifier ']' in /

score 3 · Accepted Answer

我建议你使用DOM来解析并得到你想要的结果。下面是一个例子。

<?php
$str = <<<STR
<a href="text.html" rel="external nofollow">foo bar</a>
<a href="text.html" rel="nofollow">text1:text2:text3/</a>
<a href="text.html" rel="nofollow">text1:text2:text3/</a>
<a href="example.html" rel="external nofollow">bar baz</a>
STR;

$dom = new DOMDocument;
$dom->loadHTML($str);

foreach ($dom->getElementsByTagName('a') as $node) {
   if ($node->getAttribute('rel') == 'external nofollow') {
     echo $node->getAttribute('href') . ', ' . $node->nodeValue . "\n"; 
   }
}
?>

示例的输出：

text.html, foo bar
example.html, bar baz

score 3 · Accepted Answer

我强烈建议不要将正则表达式用于这种类型的 HTML 解析任务。HTML 可以有很大的不同，你会得到意想不到的结果。

考虑使用DOM parser in PHP这样的代码：

$html = '<a href="found.html" rel="external nofollow">text1:text2:text3/</a>
         <a href="notfound.html" rel="external">text11/</a>';
$doc = new DOMDocument();
$doc->loadHTML($html); // loads your html
$xpath = new DOMXPath($doc);
$nodelist = $xpath->query("//a[contains(@rel, 'external nofollow')]");
for($i=0; $i < $nodelist->length; $i++) {
   $node = $nodelist->item($i);
   echo $node->getAttribute('href') . "\n";
}

输出：

found.html

score 1 · Accepted Answer

尝试

preg_match('/<a.*rel="external nofollow"[^>]*>([^<]*)</a>/i',
           $string_to_search_through, $res);
echo $res[1];

$res[1]将为您提供所需的文本。

score 0 · Accepted Answer

首先，您必须在您的正则表达式周围获得适当的分隔符，一个合适的分隔符是~：

$regexp = '~<a (?![^>]*?rel="external nofollow")[^>]*?href="(.*?)"~';

其次，此正则表达式将匹配锚标记之间的任何内容并捕获链接，href并且仅当rel="external nofollow"锚标记中没有时，我认为这与您尝试做的相反。负前瞻会阻止匹配。您可能希望将该正则表达式完全更改为：

$regexp = '~<a[^>]*?rel="external nofollow"[^>]*>(.*?)</a>~';

反而。

正则表达式101演示

regex - 仅当 rel= 时才从标签 a href 中提取正则表达式

4 回答 4

Related

Reference