3

我正在编写一个为用户提供 tinymce HTML 编辑器的应用程序。我面临的问题是,尽管我经常要求我的用户使用“标题 2”(h2)样式来格式化他们的标题,但他们要么使用 h1(我可以处理!),要么使用新段落,然后将内容的段落加粗。

IE

<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>

我想做的是找到所有的实例,<p><strong>其中说的单词少于八个,然后用 h2 替换它们。

做这个的最好方式是什么?

更新:感谢 Jack 的代码,我已经开发了一个简单的模块,它可以完成我在此处描述的所有内容以及更多内容。代码在 GitHub 上

4

3 回答 3

2

你可以用DOMDocument这个。找到 的<strong>子标签<p>,计算单词数并用 > 替换节点和父标签<h2

$content = <<<'EOM'
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
EOM;

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);


foreach ($xp->query('//p/strong') as $node) {
        $parent = $node->parentNode;
        if ($parent->textContent == $node->textContent && 
                str_word_count($node->textContent) <= 8) {
            $header = $doc->createElement('h2', $node->textContent);
            $parent->parentNode->replaceChild($header, $parent);
        }
}

echo $doc->saveHTML();
于 2013-04-17T03:58:57.887 回答
0

Since you seem to be proficient in PHP, you may find the PHP Simple HTML Dom Parser very intuitive for this task. Here's a snippet from the documentation showcasing a very simple way to change the tag name after locating the elements you're requesting:

$html = str_get_html("<div>foo <b>bar</b></div>");
$e = $html->find("div", 0);

echo $e->tag; // Returns: " div"
echo $e->outertext; // Returns: " <div>foo <b>bar</b></div>"
echo $e->innertext; // Returns: " foo <b>bar</b>"
echo $e->plaintext; // Returns: " foo bar"

Attribute Name  Usage
$e->tag     Read or write the tag name of element.
$e->outertext   Read or write the outer HTML text of element.
$e->innertext   Read or write the inner HTML text of element.
$e->plaintext   Read or write the plain text of element.
于 2013-04-17T03:34:56.837 回答
0

这是我处理的代码。

<?php

$content_old = <<<'EOM'
<p>&nbsp; </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>
EOM;

$content = preg_replace("/<p[^>]*>[\s|&nbsp;]*<\/p>/", '', $content_old);

$doc = new DOMDocument;
$doc->loadHTML($content);
$xp = new DOMXPath($doc);

foreach ($xp->query('//p/strong') as $node) {
    $parent = $node->parentNode;
    if ($parent->textContent == $node->textContent && 
            str_word_count($node->textContent) <= 8) {
        $header = $doc->createElement('h2');
        $parent->parentNode->replaceChild($header, $parent);
        $header->appendChild($doc->createTextNode( $node->textContent ));
    }
}

// just using saveXML() is not good enough, because it adds random html tags
$xp = new DOMXPath($doc);
$everything = $xp->query("body/*"); // retrieves all elements inside body tag
$output = '';
if ($everything->length > 0) { // check if it retrieved anything in there
    foreach ($everything as $thing) {
        $output .= $doc->saveXML($thing) . "\n";
    }
};

echo "--- ORIGINAL --\n\n";
echo $content_old;
echo "\n\n--- UPDATED ---\n\n";
echo $output;

当我运行脚本时,这是我得到的输出:

--- ORIGINAL --

<p>&nbsp; </p>
<p>lol<strong>test</strong></p>
<p><strong>This is a header</strong></p>
<p>Content content blah blah blah.</p>

--- UPDATED ---

<p>lol<strong>test</strong></p>
<h2>This is a header</h2>
<p>Content content blah blah blah.</p>

更新#1

如果标签内有其他标签<p><strong>(例如,<p><strong><a>),那么整个<p>将被替换,这不是我的意图,这是毫无价值的。

通过将 if 更改为以下内容可以轻松解决此问题:

        if ($parent->textContent == $node->textContent &&
                str_word_count($node->textContent) <= 8 &&
                $node->childNodes->item(0)->nodeType == XML_TEXT_NODE) {

更新#2

还值得注意的是,如果<p><strong>包含的 HTML 字符中的内容应该被转义(例如&),则原始 createElement 会导致问题。

旧代码是:

        $header = $doc->createElement('h2', $node->textContent);
        $parent->parentNode->replaceChild($header, $parent);

新代码(正常工作)是:

        $header = $doc->createElement('h2');
        $parent->parentNode->replaceChild($header, $parent);
        $header->appendChild($doc->createTextNode( $node->textContent ));
于 2013-04-17T06:12:06.860 回答