javascript - 用多行匹配排除某些字符的正则表达式

Question

我想确保用户输入不包含像<，>或这样的字符&#，无论是文本输入还是文本区域。我的模式：

var pattern = /^((?!&#|<|>).)*$/m;

问题是，它仍然匹配来自 textarea 之类的多行字符串

此文本匹配

虽然这不应该，因为这个字符 <

编辑：

为了更清楚，我&#只需要排除组合，而不是&or #。

请提出解决方案。非常感谢。

score 2 · Accepted Answer

特定问题的替代答案：

anubhava 的解决方案工作准确，但速度很慢，因为它必须在字符串中的每个字符位置执行负前瞻。一种更简单的方法是使用反向逻辑。即验证不匹配，而不是验证：/^((?!&#|<|>)[\s\S])*$/ 确实匹配。为了说明这一点，让我们创建一个函数：它测试字符串是否具有特殊字符之一。这里有两个版本，第一个使用 anubhava 的第二个正则表达式：/[<>]|&#/hasSpecial()

function hasSpecial_1(text) {
    // If regex matches, then string does NOT contain special chars.
    return /^((?!&#|<|>)[\s\S])*$/.test(text) ? false : true;
}
function hasSpecial_2(text) {
    // If regex matches, then string contains (at least) one special char.
    return /[<>]|&#/.test(text) ? true : false;
}

这两个函数在功能上是等效的，但第二个函数可能要快得多。

请注意，当我最初阅读此问题时，我将其误解为真的想要排除 HTML 特殊字符（包括 HTML 实体）。如果是这种情况，那么以下解决方案将做到这一点。

测试字符串是否包含 HTML 特殊字符：

OP 似乎希望确保字符串不包含任何特殊的 HTML 字符，包括：<、、>以及十进制和十六进制 HTML 实体，例如： 、、 等。如果是这种情况，那么解决方案可能还应该排除其他（命名的）HTML 实体类型，例如：&、、<等。下面的解决方案排除了所有三种形式的 HTML 实体以及<>标记分隔符。

这里有两种方法：（请注意，两种方法都允许序列：&#如果它不是有效 HTML 实体的一部分。）

使用正则表达式的 FALSE 测试：

function hasHtmlSpecial_1(text) {
    /* Commented regex:
        # Match string having no special HTML chars.
        ^                  # Anchor to start of string.
        [^<>&]*            # Zero or more non-[<>&] (normal*).
        (?:                # Unroll the loop. ((special normal*)*)
          &                # Allow a & but only if
          (?!              # not an HTML entity (3 valid types).
            (?:            # One from 3 types of HTML entities.
              [a-z\d]+     # either a named entity,
            | \#\d+        # or a decimal entity,
            | \#x[a-f\d]+  # or a hex entity.
            )              # End group of HTML entity types.
            ;              # All entities end with ";".
          )                # End negative lookahead.
          [^<>&]*          # More (normal*).
        )*                 # End unroll the loop.
        $                  # Anchor to end of string.
    */
    var re = /^[^<>&]*(?:&(?!(?:[a-z\d]+|#\d+|#x[a-f\d]+);)[^<>&]*)*$/i;
    // If regex matches, then string does NOT contain HTML special chars.
    return re.test(text) ? false : true;
}

请注意，上述正则表达式利用了 Jeffrey Friedl 的“Unrolling-the-Loop”效率技术，并且对于匹配和非匹配情况都将运行得非常快。（参见他的正则表达式杰作：精通正则表达式（第 3 版））

使用负正则表达式的 TRUE 测试：

function hasHtmlSpecial_2(text) {
    /* Commented regex:
        # Match string having one special HTML char.
          [<>]           # Either a tag delimiter
        | &              # or a & if start of
          (?:            # one of 3 types of HTML entities.
            [a-z\d]+     # either a named entity,
          | \#\d+        # or a decimal entity,
          | \#x[a-f\d]+  # or a hex entity.
          )              # End group of HTML entity types.
          ;              # All entities end with ";".
    */
    var re = /[<>]|&(?:[a-z\d]+|#\d+|#x[a-f\d]+);/i;
    // If regex matches, then string contains (at least) one special HTML char.
    return re.test(text) ? true : false;
}

另请注意，我以 JavaScript 注释的形式包含了每个（非平凡）正则表达式的注释版本。

score 2 · Accepted Answer

在这种情况下，我认为您不需要环视断言。只需使用否定字符类：

var pattern = /^[^<>&#]*$/m;

如果您还禁止使用以下字符 , -, [，]请确保将它们转义或按正确顺序排列：

var pattern = /^[^][<>&#-]*$/m;

score 2 · Accepted Answer

您可能不是在寻找m（多行）开关，而是s在 Javascript 中寻找（DOTALL）开关。不幸的是s，Javascript 中不存在。

不过好消息是 DOTALL 可以使用[\s\S]. 尝试以下正则表达式：

/^(?![\s\S]*?(&#|<|>))[\s\S]*$/

或者：

/^((?!&#|<|>)[\s\S])*$/

javascript - 用多行匹配排除某些字符的正则表达式

3 回答 3

特定问题的替代答案：

测试字符串是否包含 HTML 特殊字符：

使用正则表达式的 FALSE 测试：

使用负正则表达式的 TRUE 测试：

现场演示

javascript - 用多行匹配排除某些字符的正则表达式

3 回答 3

特定问题的替代答案：

测试字符串是否包含 HTML 特殊字符：

使用正则表达式的 FALSE 测试：

使用负正则表达式的 TRUE 测试：

现场演示

Related

Reference