3

假设我有一个字符串,例如 "Let's split this string into many small ones" ,我想将其拆分为this,intoones

这样输出看起来像这样:

["Let's split", "this string", "into many small", "ones"]

最有效的方法是什么?

4

3 回答 3

11

展望未来。

>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']
于 2012-12-18T15:03:07.333 回答
3

通过使用re.split()

>>> re.split(r'(this|into|ones)', "Let's split this string into many small ones")
["Let's split ", 'this', ' string ', 'into', ' many small ', 'ones', '']

通过将要拆分的单词放在捕获组中,输出包括我们拆分的单词。

如果您需要删除空格,map(str.strip, result)请在re.split()输出中使用:

>>> map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones"))
["Let's split", 'this', 'string', 'into', 'many small', 'ones', '']

filter(None, result)如果需要,您可以使用删除任何空字符串:

>>> filter(None, map(str.strip, re.split(r'(this|into|ones)', "Let's split this string into many small ones")))
["Let's split", 'this', 'string', 'into', 'many small', 'ones']

要拆分单词但将它们附加到以下组,您需要使用前瞻断言:

>>> re.split(r'\s(?=(?:this|into|ones)\b)', "Let's split this string into many small ones")
["Let's split", 'this string', 'into many small', 'ones']

现在我们实际上是在空格上拆分,但在空格后面跟着一个完整的单词,一个在this,into和的集合中ones

于 2012-12-18T14:52:48.403 回答
0

这是一种相当懒惰的方法:

import re

def resplit(regex,s):
    current = None
    for x in regex.finditer(s):
        start = x.start()
        yield s[current:start]
        current = start
    yield s[start:]

s = "Let's split this string into many small ones"
regex = re.compile('(this|into|ones)')
print list( resplit(regex,s) )

我不确定这是否是最有效的,但它很干净。

基本上,我们只是遍历匹配一次取 1 件。这些片段由s正则表达式开始匹配的字符串 ( ) 中的索引确定。我们只是将字符串切到那个点,然后将该索引保存为下一个切片的起点。


至于表现,ignacio 显然赢得了这一轮:

9.1412050724  -- Me
3.09771895409  -- ignacio

代码:

import re

def resplit(regex,s):
    current = None
    for x in regex.finditer(s):
        start = x.start()
        yield s[current:start]
        current = start
    yield s[start:]


def me(regex,s):
    return list(resplit(regex,s))

def ignacio(regex,s):
    return regex.split("Let's split this string into many small ones")

s = "Let's split this string into many small ones"
regex = re.compile('(this|into|ones)')
regex2 = re.compile(r'\s(?=(?:this|into|ones)\b)')

import timeit
print timeit.timeit("me(regex,s)","from __main__ import me,regex,s")
print timeit.timeit("ignacio(regex2,s)","from __main__ import ignacio,regex2,s")
于 2012-12-18T15:01:23.500 回答