php - 使用 PHP 从 HTML 字符串中获取特定数据的快速方法

Question

我避免了很多来这里分享我的问题。我用谷歌搜索了很多，找到了一些解决方案，但没有得到证实。首先我解释我的问题。

我的网站中有一个 CKEditor 让用户发表评论。假设用户点击两个帖子进行多引用，CKEditor中的数据会是这样的

<div class="quote" user_name="david_sa" post_id="223423">
This is Quoted Text 
</div>

<div class="quote" user_name="richard12" post_id="254555">
This is Quoted Text 
</div>

<div class="original">
This is the Comment Text 
</div>

我想在php中分别获取所有元素，如下所示

user_name = david_sa
post_id = 223423;
quote_text = This is Quoted Text

user_name = david_sa
post_id = richard12;
quote_text = This is Quoted Text

original_comment = This is the Comment Text

我想在 PHP 中获取上述格式的数据。我用谷歌搜索并找到了我的问题附近的 preg_match_all() PHP 函数，它使用 REGEX 来匹配字符串模式。但我不确定这是一个合法有效的解决方案还是有更好的解决方案。如果您有更好的解决方案，请建议我。

score 3 · Accepted Answer

您可以为此使用DOMDocument和。DOMXPath解析 HTML 并从中提取任何内容只需要很少的代码行。

$doc = new DOMDocument();
$doc->loadHTML(
'<html><body>' . '

<div class="quote" user_name="david_sa" post_id="223423">
This is Quoted Text 
</div>

<div class="quote" user_name="richard12" post_id="254555">
This is Quoted Text 
</div>

<div class="original">
This is the Comment Text 
</div>

' . '</body></html>');

$xpath = new DOMXPath($doc);

$quote = $xpath->query("//div[@class='quote']");
echo $quote->length; // 2
echo $quote->item(0)->getAttribute('user_name'); // david_sa
echo $quote->item(1)->getAttribute('post_id');   // 254555

// foreach($quote as $div) works as expected

$original = $xpath->query("//div[@class='original']");
echo $original->length;             // 1
echo $original->item(0)->nodeValue; // This is the Comment Text

如果您不熟悉XPath 语法，那么这里有一些示例可以帮助您入门。

score 1 · Accepted Answer

您不应该使用正则表达式来处理 HTML/XML。这就是构建DOMDocument和SimpleXML的目的。

你的问题似乎相对简单，所以你应该能够使用 SimpleXML （恰当地命名，是吧？）

score 0 · Accepted Answer

0

Do not even try regex to parse html. I would recommend simple html dom. Get it here: php html parser

于 2013-03-24T18:24:56.370 回答

php - 使用 PHP 从 HTML 字符串中获取特定数据的快速方法

3 回答 3

Related

Reference