php - php preg_match_all() 如何在匹配数组中获取正确的值

Question

以下情况：

$text = "This is some <span class='classname'>example</span> text i'm writing to
demonstrate the <span class='classname otherclass'>problem</span> of this.<br />";

preg_match_all("|<[^>/]*(classname)(.+)>(.*)</[^>]+>|U", $text, $matches, PREG_PATTERN_ORDER);

我需要一个数组（$matches），其中一个字段是“ <span class='classname'>example</span>”，另一个字段是“ example ”。但是我在这里得到的是一个带有“ <span class='classname'>example</span>”的字段和一个带有“类名”的字段。

当然，它还应该包含其他匹配项的值。

我怎样才能获得正确的价值观？

score 0 · Accepted Answer

安全/简单的方法：

$text = 'blah blah blah';

$dom = new DOM();
$dom->loadHTML($text);

$xp = new DOMXPath($dom);

$nodes = $xp->query("//span[@class='classname']");
foreach($nodes as $node) {
    $innertext = $node->nodeValue;
    $html =  // see http://stackoverflow.com/questions/2087103/innerhtml-in-phps-domdocument
}

score 0 · Accepted Answer

使用 DOM 解析器会更好，但是这个问题更多地与捕获在一般正则表达式中的工作方式有关。

您之所以classname成为比赛，是因为您正在通过()围绕它来捕捉它。它们是完全没有必要的，因此您可以删除它们。同样，您不需要它们，.+因为您不想捕获它。

If you had some group that you had to enclose in () as grouping rather than capturing, start the group with ?: and it won't be captured.

php - php preg_match_all() 如何在匹配数组中获取正确的值

2 回答 2

Related

Reference