python - 如果引号之间没有出现多个空格，请用一个空格替换它们？

Question

我有一个用例，我想用一个空格替换多个空格，除非它们出现在引号内。例如

原来的

this is the first    a   b   c
this is the second    "a      b      c"

后

this is the first a b c
this is the second "a      b      c"

我相信正则表达式应该能够做到这一点，但我对它们没有太多经验。这是我已经拥有的一些代码

import re

str = 'this is the second    "a      b      c"'
# Replace all multiple spaces with single space
print re.sub('\s\s+', '\s', str)

# Doesn't work, but something like this
print re.sub('[\"]^.*\s\s+.*[\"]^, '\s', str)

我理解为什么我上面的第二个不起作用，所以我想要一些替代方法。如果可能的话，您能否解释一下您的正则表达式解决方案的各个部分。谢谢

score 1 · Accepted Answer

假设没有"在"substring"

import re
str = 'a    b    c  "d   e   f"'  
str = re.sub(r'("[^"]*")|[ \t]+', lambda m: m.group(1) if m.group(1) else ' ', str)

print(str)
#'a b c "d   e   f"'

正则表达式("[^"]*")|[ \t]+将匹配带引号的子字符串或一个或多个单个空格或制表符。因为正则表达式首先匹配引用的子字符串，所以其中的空格将无法被替代 subpattern 匹配[ \t]+，因此将被忽略。

与引用的子字符串匹配的模式包含在其中，()因此回调可以检查它是否匹配。如果是，m.group(1)将是真实的，并且它的值只是简单地返回。如果不是，则匹配的是空格，因此返回一个空格作为替换值。

没有拉姆达

def repl(match):
    quoted = match.group(1)
    return quoted if quoted else ' '

str = re.sub(r'("[^"]*")|[ \t]+', repl, str)

score 0 · Accepted Answer

如果您想要一个每次都能可靠工作的解决方案，无论输入或其他警告（例如不允许嵌入引号），那么您想要编写一个不使用 RegExp 或拆分引号的简单解析器。

def parse(s):
    last = ''
    result = ''
    toggle = 0
    for c in s:
        if c == '"' and last != '\\':
            toggle ^= 1
        if c == ' ' and toggle == 0 and last == ' ':
            continue
        result += c
        last = c
    return result

test = r'"  <  >"test   1   2   3 "a \"<   >\"  b  c"'
print test
print parse(test)

python - 如果引号之间没有出现多个空格，请用一个空格替换它们？

2 回答 2

Related

Reference