与其他方法不同的非正则表达式方法:
>>> import string
>>> from itertools import groupby
>>>
>>> special = set(string.punctuation + string.whitespace)
>>> s = "One two three tab\ttabandspace\t end"
>>>
>>> split_combined = [''.join(g) for k, g in groupby(s, lambda c: c in special)]
>>> split_combined
['One', ' ', 'two', ' ', 'three', ' ', 'tab', '\t', 'tabandspace', '\t ', 'end']
>>> split_separated = [''.join(g) for k, g in groupby(s, lambda c: c if c in special else False)]
>>> split_separated
['One', ' ', 'two', ' ', 'three', ' ', 'tab', '\t', 'tabandspace', '\t', ' ', 'end']
我猜可以使用dict.fromkeys
and.get
而不是。lambda
[编辑]
一些解释:
groupby
接受两个参数,一个可迭代的和一个(可选的)keyfunction。它循环遍历 iterable 并将它们与 keyfunction 的值分组:
>>> groupby("sentence", lambda c: c in 'nt')
<itertools.groupby object at 0x9805af4>
>>> [(k, list(g)) for k,g in groupby("sentence", lambda c: c in 'nt')]
[(False, ['s', 'e']), (True, ['n', 't']), (False, ['e']), (True, ['n']), (False, ['c', 'e'])]
其中具有连续键函数值的术语被组合在一起。(实际上,这是一个常见的错误来源——人们忘记了如果他们想要对可能不连续的术语进行分组,他们必须首先按 keyfunc 排序。)
正如@JonClements 猜测的那样,我想到的是
>>> special = dict.fromkeys(string.punctuation + string.whitespace, True)
>>> s = "One two three tab\ttabandspace\t end"
>>> [''.join(g) for k,g in groupby(s, special.get)]
['One', ' ', 'two', ' ', 'three', ' ', 'tab', '\t', 'tabandspace', '\t ', 'end']
对于我们合并分隔符的情况。 如果值不在字典中,则.get
返回。None