1

I'm trying to write a regex that will remove HTML tags around a placeholder text, so that this:

<p>
    Blah</p>
<p>
    {{{body}}}</p>
<p>
    Blah</p>

Becomes this:

<p>
    Blah</p>
{{{body}}}
<p>
    Blah</p>

My current regex is /<.+>.*\{\{\{body\}\}\}<\/.+>/msU. However, it will also remove the contents of the tag preceding the placeholder, resulting in:

{{{body}}}
<p>
    Blah</p>

I can't assume the users will always place the placeholder inside <p>, so I would like it to remove any pair of tags immediately around the placeholder. I would appreciate some help with correcting my regex.

[EDIT]

I think it's important to note that the input may or may not be processed by CKEditor. It adds newlines and tabs to the opening tags, thus the regex needs to go with the /sm (dotall + multiline) modifiers.

4

2 回答 2

5

尝试这个:

<[^>]+>\s*\{{3}body\}{3}\s*<\/[^>]+>

在此处查看实际操作:http ://regexr.com?30s4o

这是细分:

  • <[^>]+>匹配一个开始的 HTML 标记,仅此而已。
  • \s*捕获任何空格(相当于[ \t\r\n]*
  • \{{3}{恰好匹配3 次
  • body从字面上匹配字符串
  • \}{3}}恰好匹配3 次
  • \s*再次捕获任何空白
  • <\/[^>]+>匹配结束 HTML 标记
于 2012-05-06T16:20:25.307 回答
1

php strip_tags 不适用于您的情况吗?

http://php.net/manual/en/function.strip-tags.php

<?php
$text = '<p>Test paragraph.</p><!-- Comment --> <a href="#fragment">Other text</a>';
echo strip_tags($text);
echo "\n";

// Allow <p> and <a>
echo strip_tags($text, '<p><a>');
?>
于 2012-05-06T16:35:23.573 回答