4

在 Python 中,我有很多包含空格的字符串。我想清除文本中的所有空格,除非它在引号中。

示例输入:

This is "an example text" containing spaces.

我想得到:

Thisis"an example text"containingspaces.

line.split()我认为不好,因为它清除了文本中的所有空格。

你有什么建议吗?

4

7 回答 7

5

对于仅用"作引号的简单情况:

>>> import re
>>> s = 'This is "an example text" containing spaces.'
>>> re.sub(r' (?=(?:[^"]*"[^"]*")*[^"]*$)', "", s)
'Thisis"an example text"containingspaces.'

解释:

[ ]      # Match a space
(?=      # only if an even number of spaces follows --> lookahead
 (?:     # This is true when the following can be matched:
  [^"]*" # Any number of non-quote characters, then a quote, then
  [^"]*" # the same thing again to get an even number of quotes.
 )*      # Repeat zero or more times.
 [^"]*   # Match any remaining non-quote characters
 $       # and then the end of the string.
)        # End of lookahead.
于 2013-05-27T15:30:43.220 回答
4

使用re.findall可能是更容易理解/更灵活的方法:

>>> s = 'This is "an example text" containing spaces.'
>>> ''.join(re.findall(r'(?:".*?")|(?:\S+)', s))
'Thisis"an example text"containingspaces.'

您可以(ab)使用csv.reader

>>> import csv
>>> ''.join(next(csv.reader([s.replace('"', '"""')], delimiter=' ')))
'Thisis"an example text"containingspaces.'

或使用re.split

>>> ''.join(filter(None, re.split(r'(?:\s*(".*?")\s*)|[ ]', s)))
'Thisis"an example text"containingspaces.'
于 2013-05-27T15:57:12.023 回答
4

可能有比这更优雅的解决方案,但是:

>>> test = "This is \"an example text\" containing spaces."
>>> '"'.join([x if i % 2 else "".join(x.split())
              for i, x in enumerate(test.split('"'))])
'Thisis"an example text"containingspaces.'

我们将文本拆分为引号,然后在列表推导中遍历它们。如果索引是奇数(不在引号内),我们通过拆分和重新连接来删除空格,如果它是偶数(在引号内),则不要。然后我们用引号重新加入整个事情。

于 2013-05-27T15:30:49.463 回答
1

您也可以使用 csv 执行此操作:

import csv

out=[]
for e in csv.reader('This is "an example text" containing spaces. '):
    e=''.join(e)
    if e==' ': continue
    if ' ' in e: out.extend('"'+e+'"')
    else: out.extend(e)

print ''.join(out) 

印刷Thisis"an example text"containingspaces.

于 2013-05-27T15:59:50.327 回答
1

使用正则表达式!

import cStringIO, re
result = cStringIO.StringIO()
regex = re.compile('("[^"]*")')
text = 'This is "an example text" containing spaces.'

for part in regex.split(text):
    if part and part[0] == '"':
        result.write(part)
    else:
        result.write(part.replace(" ", ""))
return result.getvalue()
于 2013-05-27T15:32:17.243 回答
0
quotation_mark = '"'                                                            
space = " "                                                                             
example = 'foo choo boo "blaee blahhh" didneid ei did '                         
formated_example = ''                                                           

if example[0] == quotation_mark:                                                           
    inside_quotes = True                                                       
else:                                                                           
    inside_quotes = False                                                        

for character in example:                                                          
    if inside_quotes != True:                                                   
        formated_example += character                                              
    else:                                                                       
        if character != space:                                                     
            formated_example += character                                          
    if character == quotation_mark:                                                
        if inside_quotes == True:                                               
            inside_quotes = False                                               
        else:                                                                   
            inside_quotes = True                                                

print formated_example
于 2013-05-27T15:51:58.927 回答
0
'"'.join(v if i%2 else v.replace(' ', '') for i, v in enumerate(line.split('"')))
于 2013-05-27T15:42:59.907 回答