php - 根据内容长度使用 PHP 更改标签

Question

我正在编写一个为用户提供 tinymce HTML 编辑器的应用程序。我面临的问题是，尽管我经常要求我的用户使用“标题 2”（h2）样式来格式化他们的标题，但他们要么使用 h1（我可以处理！），要么使用新段落，然后将内容的段落加粗。

IE

<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>

我想做的是找到所有的实例，其中说的单词少于八个，然后用 h2 替换它们。

做这个的最好方式是什么？

更新：感谢 Jack 的代码，我已经开发了一个简单的模块，它可以完成我在此处描述的所有内容以及更多内容。代码在 GitHub 上。

score 2 · Accepted Answer

你可以用DOMDocument这个。找到的子标签，计算单词数并用 > 替换节点和父标签<h2：

$content = <<<'EOM'
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
EOM;

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);


foreach ($xp->query('//p/strong') as $node) {
        $parent = $node->parentNode;
        if ($parent->textContent == $node->textContent && 
                str_word_count($node->textContent) <= 8) {
            $header = $doc->createElement('h2', $node->textContent);
            $parent->parentNode->replaceChild($header, $parent);
        }
}

echo $doc->saveHTML();

score 0 · Accepted Answer

Since you seem to be proficient in PHP, you may find the PHP Simple HTML Dom Parser very intuitive for this task. Here's a snippet from the documentation showcasing a very simple way to change the tag name after locating the elements you're requesting:

$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);

echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"

Attribute Name  Usage
$e->tag     Read or write the tag name of element.
$e->outertext   Read or write the outer HTML text of element.
$e->innertext   Read or write the inner HTML text of element.
$e->plaintext   Read or write the plain text of element.

score 0 · Accepted Answer

这是我处理的代码。

<?php

$content_old = <<<'EOM'
<p>&nbsp; </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
EOM;

$content = preg_replace("/<p[^>]*>[\s|&nbsp;]*<\/p>/", '', $content_old);

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);

foreach ($xp->query('//p/strong') as $node) {
    $parent = $node->parentNode;
    if ($parent->textContent == $node->textContent && 
            str_word_count($node->textContent) <= 8) {
        $header = $doc->createElement('h2');
        $parent->parentNode->replaceChild($header, $parent);
        $header->appendChild($doc->createTextNode( $node->textContent ));
    }
}

// just using saveXML() is not good enough, because it adds random html tags
$xp = new DOMXPath($doc);
$everything = $xp->query("body/*"); // retrieves all elements inside body tag
$output = '';
if ($everything->length > 0) { // check if it retrieved anything in there
    foreach ($everything as $thing) {
        $output .= $doc->saveXML($thing) . "\n";
    }
};

echo "--- ORIGINAL --\n\n";
echo $content_old;
echo "\n\n--- UPDATED ---\n\n";
echo $output;

当我运行脚本时，这是我得到的输出：

--- ORIGINAL --

<p>&nbsp; </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>

--- UPDATED ---

<p>lol<strong>test</strong></p>
<h2>This is a header</h2>
<p>Content content blah blah blah.</p>

更新#1

如果标签内有其他标签（例如，<a>），那么整个将被替换，这不是我的意图，这是毫无价值的。

通过将 if 更改为以下内容可以轻松解决此问题：

        if ($parent->textContent == $node->textContent &&
                str_word_count($node->textContent) <= 8 &&
                $node->childNodes->item(0)->nodeType == XML_TEXT_NODE) {

更新#2

还值得注意的是，如果包含的 HTML 字符中的内容应该被转义（例如&），则原始 createElement 会导致问题。

旧代码是：

        $header = $doc->createElement('h2', $node->textContent);
        $parent->parentNode->replaceChild($header, $parent);

新代码（正常工作）是：

        $header = $doc->createElement('h2');
        $parent->parentNode->replaceChild($header, $parent);
        $header->appendChild($doc->createTextNode( $node->textContent ));

php - 根据内容长度使用 PHP 更改标签

3 回答 3

更新#1

更新#2

Related

Reference