假设我有一个字符串,例如
"Let's split this string into many small ones"
,我想将其拆分为this
,into
和ones
这样输出看起来像这样:
["Let's split", "this string", "into many small", "ones"]
最有效的方法是什么?
展望未来。
>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']
通过使用re.split()
:
>>> re.split(r'(this|into|ones)', "Let's split this string into many small ones")
["Let's split ", 'this', ' string ', 'into', ' many small ', 'ones', '']
通过将要拆分的单词放在捕获组中,输出包括我们拆分的单词。
如果您需要删除空格,map(str.strip, result)
请在re.split()
输出中使用:
>>> map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones"))
["Let's split", 'this', 'string', 'into', 'many small', 'ones', '']
filter(None, result)
如果需要,您可以使用删除任何空字符串:
>>> filter(None, map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones")))
["Let's split", 'this', 'string', 'into', 'many small', 'ones']
要拆分单词但将它们附加到以下组,您需要使用前瞻断言:
>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']
现在我们实际上是在空格上拆分,但只在空格后面跟着一个完整的单词,一个在this
,into
和的集合中ones
。
这是一种相当懒惰的方法:
import re
def resplit(regex,s):
current = None
for x in regex.finditer(s):
start = x.start()
yield s[current:start]
current = start
yield s[start:]
s = "Let's split this string into many small ones"
regex = re.compile('(this|into|ones)')
print list( resplit(regex,s) )
我不确定这是否是最有效的,但它很干净。
基本上,我们只是遍历匹配一次取 1 件。这些片段由s
正则表达式开始匹配的字符串 ( ) 中的索引确定。我们只是将字符串切到那个点,然后将该索引保存为下一个切片的起点。
至于表现,ignacio 显然赢得了这一轮:
9.1412050724 -- Me
3.09771895409 -- ignacio
代码:
import re
def resplit(regex,s):
current = None
for x in regex.finditer(s):
start = x.start()
yield s[current:start]
current = start
yield s[start:]
def me(regex,s):
return list(resplit(regex,s))
def ignacio(regex,s):
return regex.split("Let's split this string into many small ones")
s = "Let's split this string into many small ones"
regex = re.compile('(this|into|ones)')
regex2 = re.compile(r'\s(?=(?:this|into|ones)\b)')
import timeit
print timeit.timeit("me(regex,s)","from __main__ import me,regex,s")
print timeit.timeit("ignacio(regex2,s)","from __main__ import ignacio,regex2,s")