在 Python 中,我有很多包含空格的字符串。我想清除文本中的所有空格,除非它在引号中。
示例输入:
This is "an example text" containing spaces.
我想得到:
Thisis"an example text"containingspaces.
line.split()
我认为不好,因为它清除了文本中的所有空格。
你有什么建议吗?
对于仅用"
作引号的简单情况:
>>> import re
>>> s = 'This is "an example text" containing spaces.'
>>> re.sub(r' (?=(?:[^"]*"[^"]*")*[^"]*$)', "", s)
'Thisis"an example text"containingspaces.'
解释:
[ ] # Match a space
(?= # only if an even number of spaces follows --> lookahead
(?: # This is true when the following can be matched:
[^"]*" # Any number of non-quote characters, then a quote, then
[^"]*" # the same thing again to get an even number of quotes.
)* # Repeat zero or more times.
[^"]* # Match any remaining non-quote characters
$ # and then the end of the string.
) # End of lookahead.
使用re.findall
可能是更容易理解/更灵活的方法:
>>> s = 'This is "an example text" containing spaces.'
>>> ''.join(re.findall(r'(?:".*?")|(?:\S+)', s))
'Thisis"an example text"containingspaces.'
您可以(ab)使用csv.reader
:
>>> import csv
>>> ''.join(next(csv.reader([s.replace('"', '"""')], delimiter=' ')))
'Thisis"an example text"containingspaces.'
或使用re.split
:
>>> ''.join(filter(None, re.split(r'(?:\s*(".*?")\s*)|[ ]', s)))
'Thisis"an example text"containingspaces.'
可能有比这更优雅的解决方案,但是:
>>> test = "This is \"an example text\" containing spaces."
>>> '"'.join([x if i % 2 else "".join(x.split())
for i, x in enumerate(test.split('"'))])
'Thisis"an example text"containingspaces.'
我们将文本拆分为引号,然后在列表推导中遍历它们。如果索引是奇数(不在引号内),我们通过拆分和重新连接来删除空格,如果它是偶数(在引号内),则不要。然后我们用引号重新加入整个事情。
您也可以使用 csv 执行此操作:
import csv
out=[]
for e in csv.reader('This is "an example text" containing spaces. '):
e=''.join(e)
if e==' ': continue
if ' ' in e: out.extend('"'+e+'"')
else: out.extend(e)
print ''.join(out)
印刷Thisis"an example text"containingspaces.
使用正则表达式!
import cStringIO, re
result = cStringIO.StringIO()
regex = re.compile('("[^"]*")')
text = 'This is "an example text" containing spaces.'
for part in regex.split(text):
if part and part[0] == '"':
result.write(part)
else:
result.write(part.replace(" ", ""))
return result.getvalue()
quotation_mark = '"'
space = " "
example = 'foo choo boo "blaee blahhh" didneid ei did '
formated_example = ''
if example[0] == quotation_mark:
inside_quotes = True
else:
inside_quotes = False
for character in example:
if inside_quotes != True:
formated_example += character
else:
if character != space:
formated_example += character
if character == quotation_mark:
if inside_quotes == True:
inside_quotes = False
else:
inside_quotes = True
print formated_example
'"'.join(v if i%2 else v.replace(' ', '') for i, v in enumerate(line.split('"')))