html - 正则表达式空格减少 + 文本区域排除

Question

我正在使用/\s+/将所有空白字符减少到一个（在每个组中）。这目前用于缩小 HTML，但是 textareas 需要额外的换行符，否则会被过滤。如何修改此正则表达式以忽略<textarea></textarea>标签内的换行符？

此外，一个 textarea 可能具有诸如idor之类的属性class。

任何帮助，将不胜感激。

score 0 · Accepted Answer

/(?:\s+(?![^<]*<\/textarea>)|[^\S\n\r]+)/使用带有不区分大小写修饰符的正则表达式模式。

score 0 · Accepted Answer

好的，这是 PHP 中的通用解决方案，希望用您用于此任务的任何语言重写它会很容易。

$raw = '
  My   line   is   here <textarea>And 
there</textarea> there   and everywhere';

$chunks = preg_split('#(<textarea>.+?</textarea>)#si', 
  $raw, null, 
  PREG_SPLIT_DELIM_CAPTURE|PREG_SPLIT_NO_EMPTY); // -- 1

$chunks_length = count($chunks);
for ($index = 0; 
     $index < $chunks_length; 
     $index += 2) { // -- 2
  $chunks[$index] = preg_replace('#(\s)+#', '$1', $chunks[$index]); // -- 3
}

var_dump(implode('', $chunks));
// My line is here <textarea>And 
// there</textarea> there and everywhere

这就是这里发生的事情：--1我们将您的文本拆分为片段数组。这个数组中具有奇数索引 [1, 3, ...] 的元素实际上是 'textarea' 块，因为我们设置preg_split为在 'delimiter-capturing' 模式下工作。关键是我们不会处理它们（在for循环中遍历它们），并且只会压缩--3“内容”元素的空白（）。

尽管如此，这种方法仍然非常脆弱，正如 Rob W 正确提到的那样：并非 HTML 中的所有空白都可以轻松压缩。

出于某种原因使用正则表达式中的PSs修饰符；否则.+?模式将无法捕获结束\n符号（阻止正确捕获多行块）。

html - 正则表达式空格减少 + 文本区域排除

2 回答 2

Related

Reference