string - “on-the-fly”（例如生成器）拆分像 split() 但在 Python 3 中没有正则表达式

Question

与上一个问题有关：Python3 split() with generator。

有没有使用生成器或迭代器拆分列表的方法，但比创建正则表达式更有效？

我想“.split()”不是用正则表达式实现的。

我希望看到等效的，但没有在内存中创建整个拆分列表，而是使用生成器或迭代器“即时”创建。

score 1 · Accepted Answer

这似乎比正则表达式快一点：

def itersplit2(s, sep):
    i = 0
    l = len(sep)
    j = s.find(sep, i)
    while j > -1:
        yield s[i:j]
        i = j + l
        j = s.find(sep, i)
    else:
        yield s[i:]

但比它慢10 倍str.split

score 0 · Accepted Answer

这是与 None 不同的分隔符版本：

def iter_split(s, sep):
    start = 0
    L = len(s)
    lsep = len(sep)
    assert lsep > 0
    while start < L:
        end = s.find(sep, start)
        if end != -1:
            yield s[start:end]
            start = end + lsep
            if start == L:
                yield ''    # sep found but nothing after
        else:
            yield s[start:] # the last element
            start = L       # to quit the loop

我没有对其进行大量测试，因此它可能包含一些错误。结果与str.split()：

sep = '<>'
s = '1<>2<>3'
print('--------------', repr(s), repr(sep))
print(s.split(sep))
print(list(iter_split(s, sep)))

s = '<>1<>2<>3<>'
print('--------------', repr(s), repr(sep))
print(s.split(sep))
print(list(iter_split(s, sep)))

sep = ' '
s = '1 2 3'
print('--------------', repr(s), repr(sep))
print(s.split(sep))
print(list(iter_split(s, sep)))

s = '1   2   3'
print('--------------', repr(s), repr(sep))
print(s.split(sep))
print(list(iter_split(s, sep)))

它显示：

-------------- '1<>2<>3' '<>'
['1', '2', '3']
['1', '2', '3']
-------------- '<>1<>2<>3<>' '<>'
['', '1', '2', '3', '']
['', '1', '2', '3', '']
-------------- '1 2 3' ' '
['1', '2', '3']
['1', '2', '3']
-------------- '1   2   3' ' '
['1', '', '', '2', '', '', '3']
['1', '', '', '2', '', '', '3']

由于规则更多，默认None分隔符的实现会更复杂。

无论如何，预编译的正则表达式非常有效。它们在编写时容易出错，但一旦准备好，它们就会很快。

string - “on-the-fly”（例如生成器）拆分像 split() 但在 Python 3 中没有正则表达式

2 回答 2

Related

Reference