php - 通过php从HTML标签中提取数据[自动获取开始和结束标签]

Question

想要制作一个脚本，该脚本将自动从 html 标签（开始和结束）获取内容并将它们存储到一个数组中。

例子：

输入：

$str = <p>This is a sample <b>text</b> </p> this is out of tags.<p>This is <p>another text</p>for same aggregate <i>tags</i>.</p>

输出：

$blocks[0] = <p>This is a sample <b>text</b> </p>
$blocks[1] = <p>This is <p>another text</p>for same aggregate <i>tags</i>.</p>

NB: the first block start with <p> so must be stop at </p>, the second block again start with <p> but it has another start and end paragraph[<p></p>] between this, and stop when find </p> . That means i want to put all of the data and inner tags between start and end tags.

score 0 · Accepted Answer

我将尝试对此提供答案，尽管此解决方案并不能准确地为您提供所需的内容，因为嵌套<p>标签不是有效的 HTML。使用 PHP 的DOMDocument，您可以像这样提取段落标签。

<?php

$test = "<p>This is a sample <b>text</b> </p> this is out of tags.<p>This is <p>another text</p>for same aggregate <i>tags</i>.</p>";

$html = new DOMDocument();
$html->loadHTML($test);
$p_tags = array();

foreach ($html->getElementsByTagName('p') as $p) {
    $p_tags[] = $html->saveHTML($p);
}

print_r($p_tags);

?>

由于无效的标签嵌套向您抛出一些警告后，输出应如下所示：

Array
(
    [0] => <p>This is a sample <b>text</b> </p>
    [1] => <p>This is </p>

    [2] => <p>another text</p>
)

score 0 · Accepted Answer

您可以使用Simple Html Dom库来执行此操作。这是示例。

require_once('simple_html_dom.php');    
$html = " <p>This is a sample <b>text</b> </p> this is out of tags.<p>This is <p>another text</p>for same aggregate <i>tags</i>.</p>";    
$html = str_get_html($html);    
$p = $html->find('p');
$contentArray = array();
foreach($p as $element)
    $contentArray[] = $element->innertext; //You can try $element->outertext to get the output with tag. ie. <p>content</p>

print_r($contentArray);

你的输出是这样的：

Array
(
    [0] => This is a sample <b>text</b> 
    [1] => This is 
    [2] => another text
)

php - 通过php从HTML标签中提取数据[自动获取开始和结束标签]

2 回答 2

Related

Reference