javascript - 用 JavaScript 匹配 HTML 字符串中的所有空格

Question

假设您有一个这样的 HTML 字符串：

<div id="loco" class="hey" >lorem ipsum pendus <em>hey</em>moder <hr /></div>

并且需要<br/>在每个空格字符之后放置元素......我正在使用：

HTMLtext.replace(/\s{1,}/g, ' <br/>');

但是，问题是这也会在标签之间（标签属性之间）的空格字符之后插入中断，我当然只想对标签文本内容执行此操作。不知何故，我对正则表达式总是很糟糕——有人能帮忙吗？

所以基本上我的原始空格匹配但前提是它不在 < 和 > 之间？

score 4 · Accepted Answer

正则表达式不是一个很好的工具。您应该使用 DOM，而不是使用原始 HTML 字符串。

对于一个快速而肮脏的解决方案，它假定您的字符串中除了分隔标签的那些之外没有 <或>字符，您可以尝试这个，但是：

result = subject.replace(/\s+(?=[^<>]*<)/g, "$&<br/>");

<br/>仅当下一个尖括号是左尖括号时，才会插入后面的空格。

解释：

\s+     # Match one or more whitespace characters (including newlines!)
(?=     # but only if (positive lookahead assertion) it's possible to match...
 [^<>]* #  any number of non-angle brackets
 <      #  followed by an opening angle bracket
)       # ...from this position in the string onwards.

将其替换为$&（包含匹配的字符）加号<br/>。

这个正则表达式不检查是否有>更远的后面，因为这需要一个积极的look*behind*断言，而JavaScript不支持这些。所以你无法检查，但如果你控制了 HTML 并确定我上面提到的条件得到满足，那应该不是问题。

score 2 · Accepted Answer

请参阅此答案以迭代 dom 并用<br />元素替换空格。修改后的代码将是：

(function iterate_node(node) {
    if (node.nodeType === 3) { // Node.TEXT_NODE
        var text = node.data,
            words = text.split(/\s/);
        if (words.length > 1) {
            node.data = words[0];
            var next = node.nextSibling,
                parent = node.parentNode;
            for (var i=1; i<words.length; i++) {
                var tnode = document.createTextNode(words[i]),
                    br = document.createElement("br");
                parent.insertBefore(br, next);
                parent.insertBefore(tnode, next);
            }
        }
    } else if (node.nodeType === 1) { // Node.ELEMENT_NODE
        for (var i=node.childNodes.length-1; i>=0; i--) {
            iterate_node(node.childNodes[i]); // run recursive on DOM
        }
    }
})(content); // any dom node

（在 jsfiddle.net 上的演示）

score 0 · Accepted Answer

好的，所以您不想匹配 HTML 标记中的空格。仅正则表达式是不够的。我将使用词法分析器来完成这项工作。您可以在此处查看输出。

var lexer = new Lexer;

var result = "";

lexer.addRule(/</, function (c) { // start of a tag
    this.state = 2; // go to state 2 - exclusive tag state
    result += c; // copy to output
});

lexer.addRule(/>/, function (c) { // end of a tag
    this.state = 0; // go back to state 0 - initial state
    result += c; // copy to output
}, [2]); // only apply this rule when in state 2

lexer.addRule(/.|\n/, function (c) { // match any character
    result += c; // copy to output
}, [2]); // only apply this rule when in state 2

lexer.addRule(/\s+/, function () { // match one or more spaces
    result += "<br/>"; // replace with "<br/>"
});

lexer.addRule(/.|\n/, function (c) { // match any character
    result += c; // copy to output
}); // everything else

lexer.input = '<div id="loco" class="hey" >lorem ipsum pendus <em>hey</em>moder <hr /></div>';

lexer.lex();

当然，词法分析器是一个非常强大的工具。您也可以跳过标记中属性值内的尖括号。但是，我将把它留给您实施。祝你好运。

javascript - 用 JavaScript 匹配 HTML 字符串中的所有空格

3 回答 3

Related

Reference