php - 使用 RegExp PHP 从标签中提取文本

Question

我正在尝试从网页的源代码中提取一些字符串，如下所示：

<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>

我很确定这些字符串是唯一以单行 break(
) 结尾的东西。其他所有内容都以两个或多个换行符结束。我试过用这个：

preg_match_all('~(.*?)<br />{1}~', $source, $matches);

但它不像它应该的那样工作。它还返回一些其他文本以及这些字符串。

score 3 · Accepted Answer

DOMDocument 和 XPath 来拯救。

$html = <<<EOM
<p class="someclass">
String1<br />
String2<br />
String3<br />
</p>
EOM;

$doc = new DOMDocument;
$doc->loadHTML($html);
$xp = new DOMXPath($doc);

foreach ($xp->query('//p[contains(concat(" ", @class, " "), " someclass ")]') as $node) {
    echo $node->textContent;
}

演示

score 2 · Accepted Answer

我不建议使用正则表达式来获取值。相反，使用 PHP 的内置 HTML 解析器，如下所示：

$dom = new DOMDocument();
$dom->loadHTML($source);
$xpath = new DOMXPath($dom);

$elements = $xpath->query('//p[@class="someclass"]');
$text = array(); // to hold the strings
if (!is_null($elements)) {
    foreach ($elements as $element) {
        $text[] = strip_tags($element->nodeValue);
    }
}
print_r($text); // print out all the strings

这是经过测试和工作的。您可以在此处阅读有关 PHP 的 DOMDocument 类的更多信息：http ://www.php.net/manual/en/book.dom.php

这是一个演示： http: //phpfiddle.org/lite/code/0nv-hd6（单击“运行”）

score -1 · Accepted Answer

-1

尝试这个：

preg_match_all('~^(.*?)<br />$~m', $source, $matches);

于 2013-06-18T13:25:08.780 回答

score -1 · Accepted Answer

应该管用。请尝试一下

preg_match_all("/([^<>]*?)<br\s*\/?>/", $source, $matches);

或者，如果您的字符串可能包含一些 HTML 代码，请使用以下代码：

preg_match_all("/(.*?)<br\s*\/?>\\n/", $source, $matches);

php - 使用 RegExp PHP 从标签中提取文本

4 回答 4

Related

Reference