php - 正则表达式 php：在 div 中查找所有内容

Question

我正在尝试使用正则表达式在 div 中查找所有内容。我知道可能有更聪明的方法可以做到这一点 - 但我选择了正则表达式。

所以目前我的正则表达式模式看起来像这样：

$gallery_pattern = '/<div class="gallery">([\s\S]*)<\/div>/';

它确实起到了作用 - 有点。

问题是如果我有两个 div 一个接一个 - 像这样。

<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>

我想从两个 div 中提取信息，但我的问题是，在测试时，我没有得到介于两者之间的文本，而是：

"text to extract here </div>  
<div class="gallery">text to extract from here as well"

所以总结一下。它跳过了 div 的第一端。并继续下一个。div 内的文本可以包含<,/和换行符。让你知道！

有没有人有一个简单的解决方案来解决这个问题？我仍然是一个正则表达式新手。

score 12 · Accepted Answer

当有一个方便的 DOM 库时，你不应该使用正则表达式来解析 HTML：

$str = '
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
';

$doc = new DOMDocument();
$doc->loadHTML($str);
$divs = $doc->getElementsByTagName('div');

if ( count($divs ) ) {
    foreach ( $divs as $div ) {
    echo $div->nodeValue . '<br>';
    }
}

score 9 · Accepted Answer

像这样的东西怎么样：

$str = <<<HTML
<div class="gallery">text to extract here</div>
<div class="gallery">text to extract from here as well</div>
HTML;

$matches = array();
preg_match_all('#<div[^>]*>(.*?)</div>#s', $str, $matches);

var_dump($matches[1]);

注意“？” 在正则表达式中，所以它“不贪婪”。

这会让你：

array
  0 => string 'text to extract here' (length=20)
  1 => string 'text to extract from here as well' (length=33)

这应该可以正常工作...如果您没有叠瓦状的 div ；如果你这样做......嗯......实际上：你真的确定要使用理性表达式来解析 HTML，这本身就不是那么理性吗？

score 0 · Accepted Answer

这个问题的可能答案可以在 http://simplehtmldom.sourceforge.net/找到那个类帮助我快速解决类似的问题

php - 正则表达式 php：在 div 中查找所有内容

3 回答 3

Related

Reference