python - python 正则表达式：使用第一个空白作为 sep 但保留其余空白序列

Question

我现在在这个正则表达式上战斗太久了。拆分应使用空白作为分隔符，但将剩余的以空白序列保持到下一个标记

'123 45   678    123.0'
=>
'123', '45', '  678', '   123.0'

我的数字也是浮点数，组数未知。

score 2 · Accepted Answer

使用后向断言怎么样？：

>>> import re
>>> regex = re.compile(r'(?<=[^\s])\s')
>>> regex.split('this  is a   string')
['this', ' is', 'a', '  string']

正则表达式分解：

(?<=...)  #lookbehind.  Only match if the `...` matches before hand
[^\s]     #Anything that isn't whitespace
\s        #single whitespace character

在英语中，这翻译为“如果前面没有空白字符，则匹配单个空白字符”。

或者您可以使用否定的lookbehind断言：

regex = re.compile(r'(?<!\s)\s')

这可能会稍微好一些（如评论中所建议的那样），并且应该相对容易弄清楚它是如何工作的，因为它与上述非常相似。

1 回答 1