php - PHP -> preg_match_all 用于以下结构

Question

我急切地寻找获取此文本字符串的解决方案

<h6>First pane</h6>
... pane content ...
<h6>Second pane</h6>
Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
<h6>Last pane</h6>
... last pane content ...

解析为 PHP 数组。

我需要把它分开

1.
1.0=> First pane
1.1=> ... pane content ... 

2.
2.0=> Second pane
2.1=> Hi, this is a comment.
    To delete a comment, just log in and view the post's comments.
    There you will have the option to edit
    or delete them.

3.
3.0=> Last pane
3.1=> ... last pane content ...

score 1 · Accepted Answer

您的正则表达式应如下所示：

/<h6>([^<]+)<\/h6>([^<]+)/im

如果您运行以下脚本，您将看到您要查找的值位于 $matches[1] 和 $matches[2] 中。

$s = "<h6>First pane</h6>
... pane content ...
<h6>Second pane</h6>
Hi, this is a comment.
To delete a comment, just log in and view the post's comments.
There you will have the option to edit
or delete them.
<h6>Last pane</h6>
... last pane content ..";
$r = "/<h6>([^<]+)<\/h6>([^<]+)/im";

$matches = array();
preg_match_all($r,$s,$matches);

print_r($matches);

score 1 · Accepted Answer

您不应该尝试使用正则表达式解析 HTML。这注定会给除了最简单的 HTML 之外的所有人带来很多痛苦和不愉快，并且如果您的文档结构中的任何内容发生更改，它将立即中断。请改用适当的 HTML 或 DOM 解析器，例如 php 的DOMDocument http://php.net/manual/en/class.domdocument.php

例如，您可以使用 getElementsByTagName http://www.php.net/manual/en/domdocument.getelementsbytagname.phph6来获取所有

score 0 · Accepted Answer

我相信 PREG_SET_ORDER 标志是您正在寻找的。

$regex = '~<h6>([^<]+)</h6>\s*([^<]+)~i';

preg_match_all($regex, $source, $matches, PREG_SET_ORDER);

这样，$matches 数组中的每个元素都是一个数组，其中包含整个匹配项，然后是单个匹配尝试的所有组捕获。第一场比赛的结果如下所示：

大批
(
    [0] => 数组
        (
            [0] => 第一个窗格
...窗格内容...

            [1] => 第一个窗格
            [2] => ... 窗格内容 ...

        )

在ideone上看到它

编辑：请注意\s*我也添加了。没有它，匹配的内容总是在没有行分隔符的情况下开始。

php - PHP -> preg_match_all 用于以下结构

3 回答 3

Related

Reference