php - 如何获取 HTML 文档中的所有 TEXT 外部元素

Question

我正在使用 Symfony DomCrawler 获取文档中的所有文本。

$this->crawler->filter('p')->each(function (Crawler $node, $i) {
    // process text
});

我正在尝试收集<body>元素之外的所有文本。

<body>
    This is an example
    <p>
        blablabla
    </p>
    another example
    <p>
        <span>Yo!</span>
        again, another piece of text <br/>
        with an annoy BR in the middle
    </p>
</body>

我正在使用 PHP Symfony，可以使用 XPath（首选）或 RegEx。

score 0 · Accepted Answer

整个文档的字符串值可以通过这个简单的 XPath 获得：

string(/)

文档中的所有文本节点将是：

//text()

的直接文本节点子节点body将是：

/body/text()

请注意，选择文本节点的 XPath 通常会根据上下文转换为连接的字符串值。

php - 如何获取 HTML 文档中的所有 TEXT 外部元素

1 回答 1

Related

Reference