python - python在没有正则表达式的多个分隔符上拆分字符串

Question

我有一个字符串，我需要在不使用正则表达式的情况下拆分多个字符。例如，我需要以下内容：

>>>string="hello there[my]friend"
>>>string.split(' []')
['hello','there','my','friend']

python中有这样的东西吗？

score 8 · Accepted Answer

如果您需要多个分隔符，re.split这是要走的路。

如果不使用正则表达式，除非您为其编写自定义函数，否则这是不可能的。

这是一个这样的功能 - 它可能会或可能不会做你想要的（连续的分隔符导致空元素）：

>>> def multisplit(s, delims):
...     pos = 0
...     for i, c in enumerate(s):
...         if c in delims:
...             yield s[pos:i]
...             pos = i + 1
...     yield s[pos:]
...
>>> list(multisplit('hello there[my]friend', ' []'))
['hello', 'there', 'my', 'friend']

score 1 · Accepted Answer

没有正则表达式的解决方案：

from itertools import groupby
sep = ' []'
s = 'hello there[my]friend'
print [''.join(g) for k, g in groupby(s, sep.__contains__) if not k]

我刚刚在这里发布了一个解释https://stackoverflow.com/a/19211729/2468006

score 1 · Accepted Answer

不使用正则表达式的递归解决方案。与其他答案相比，仅使用基本 python。

def split_on_multiple_chars(string_to_split, set_of_chars_as_string):
    # Recursive splitting
    # Returns a list of strings

    s = string_to_split
    chars = set_of_chars_as_string

    # If no more characters to split on, return input
    if len(chars) == 0:
        return([s])

    # Split on the first of the delimiter characters
    ss = s.split(chars[0])

    # Recursive call without the first splitting character
    bb = []
    for e in ss:
        aa = split_on_multiple_chars(e, chars[1:])
        bb.extend(aa)
    return(bb)

工作原理与 pythons regular 非常相似string.split(...)，但接受多个分隔符。

示例使用：

print(split_on_multiple_chars('my"example_string.with:funny?delimiters', '_.:;'))

输出：

['my"example', 'string', 'with', 'funny?delimiters']

score -3 · Accepted Answer

re.split是正确的工具。

>>> string="hello there[my]friend"
>>> import re
>>> re.split('[] []', string)
['hello', 'there', 'my', 'friend']

在正则表达式中，[...]定义一个字符类。括号内的任何字符都将匹配。我分隔括号的方式避免了需要转义它们，但该模式[\[\] ]也有效。

>>> re.split('[\[\] ]', string)
['hello', 'there', 'my', 'friend']

re.compile的re.DEBUG标志也很有用，因为它打印出模式将匹配的内容：

>>> re.compile('[] []', re.DEBUG)
in 
  literal 93
  literal 32
  literal 91
<_sre.SRE_Pattern object at 0x16b0850>

（其中 32、91、93 是分配给、[、的 ascii 值]）

python - python在没有正则表达式的多个分隔符上拆分字符串

4 回答 4

Related

Reference