regex - 如何忽略正则表达式主题字符串中的空格？

Question

在使用正则表达式模式搜索匹配项时，是否有一种简单的方法可以忽略目标字符串中的空格？例如，如果我的搜索是“cats”，我希望匹配“c ats”或“cat ts”。我无法事先去除空格，因为我需要找到匹配的开始和结束索引（包括任何空格）以突出显示该匹配并且任何空格都需要用于格式化目的。

score 141 · Accepted Answer

您可以在正则表达式中的每个其他字符之间粘贴可选的空白字符\s*。虽然被授予，但它会变得有点冗长。

/cats/->/c\s*a\s*t\s*s/

score 13 · Accepted Answer

虽然公认的答案在技术上是正确的，但如果可能的话，一种更实用的方法是从正则表达式和搜索字符串中去掉空格。

如果您想搜索“我的猫”，而不是：

myString.match(/m\s*y\s*c\s*a\*st\s*s\s*/g)

做就是了：

myString.replace(/\s*/g,"").match(/mycats/g)

警告：您不能通过将所有空格替换为空字符串来在正则表达式上自动执行此操作，因为它们可能会出现在否定中或以其他方式使您的正则表达式无效。

score 10 · Accepted Answer

解决史蒂文对萨姆杜菲尔的回答的评论

谢谢，听起来就是这样。但我刚刚意识到，如果它们跟随换行符，我只想要可选的空白字符。例如，“c\n ats”或“ca\n ts”应该匹配。但如果没有换行符，不希望“c ats”匹配。关于如何做到这一点的任何想法？

这应该可以解决问题：

/c(?:\n\s*)?a(?:\n\s*)?t(?:\n\s*)?s/

请参阅此页面以了解与此匹配的“猫”的所有不同变体。

您也可以使用条件来解决这个问题，但它们在正则表达式的 javascript 风格中不受支持。

score 7 · Accepted Answer

您可以\s*在搜索字符串中的每个字符之间插入，因此如果您正在寻找 cat，您可以使用c\s*a\s*t\s*s\s*s

它很长，但您当然可以动态构建字符串。

你可以在这里看到它的工作：http ://www.rubular.com/r/zzWwvppSpE

score 4 · Accepted Answer

如果你只想允许空格，那么

\bc *a *t *s\b

应该这样做。要也允许选项卡，请使用

\bc[ \t]*a[ \t]*t[ \t]*s\b

\b如果您还想在诸如orcats之类的单词中查找，请删除锚点。bobcatscatsup

score 3 · Accepted Answer

这种方法可用于自动执行此操作（以下示例性解决方案是在 python 中，但显然它可以移植到任何语言）：

您可以事先去除空格并保存非空格字符的位置，以便以后可以使用它们来找出原始字符串中匹配的字符串边界位置，如下所示：

def regex_search_ignore_space(regex, string):
    no_spaces = ''
    char_positions = []

    for pos, char in enumerate(string):
        if re.match(r'\S', char):  # upper \S matches non-whitespace chars
            no_spaces += char
            char_positions.append(pos)

    match = re.search(regex, no_spaces)
    if not match:
        return match

    # match.start() and match.end() are indices of start and end
    # of the found string in the spaceless string
    # (as we have searched in it).
    start = char_positions[match.start()]  # in the original string
    end = char_positions[match.end()]  # in the original string
    matched_string = string[start:end]  # see

    # the match WITH spaces is returned.
    return matched_string

with_spaces = 'a li on and a cat'
print(regex_search_ignore_space('lion', with_spaces))
# prints 'li on'

如果你想更进一步，你可以构造匹配对象并返回它，所以使用这个助手会更方便。

而且这个功能的性能当然也可以优化，这个例子只是为了展示解决方案的路径。

regex - 如何忽略正则表达式主题字符串中的空格？

6 回答 6

Related

Reference