python - Python用括号分割字符串

Question

不久前我问了一个问题（Python 用空格和括号分割未知字符串）在我不得不改变我的思维方式之前效果很好。我还没有掌握正则表达式，所以我需要一些帮助。

如果用户键入：

new test (test1 test2 test3) test "test5 test6"

我希望它看起来像这样的变量的输出：

["new", "test", "test1 test2 test3", "test", "test5 test6"]

换句话说，如果它是一个由空格分隔的单词，则将其与下一个单词分开，如果它在括号中，则将括号中的整个单词组拆分并删除它们。引号也是如此。

我目前使用的代码不符合上述标准（来自上面链接中的答案）：

>>>import re
>>>strs = "Hello (Test1 test2) (Hello1 hello2) other_stuff"
>>>[", ".join(x.split()) for x in re.split(r'[()]',strs) if x.strip()]
>>>['Hello', 'Test1, test2', 'Hello1, hello2', 'other_stuff']

这很好用，但是如果你有这个问题，那就有问题了：

strs = "Hello Test (Test1 test2) (Hello1 hello2) other_stuff"

它将 Hello 和 Test 组合为一个拆分而不是两个拆分。

它也不允许同时使用括号和引号拆分。

score 6 · Accepted Answer

6

答案很简单：

re.findall('\[[^\]]*\]|\([^\)]*\)|\"[^\"]*\"|\S+',strs)

于 2013-06-28T20:26:01.117 回答

score 3 · Accepted Answer

这是推动正则表达式可以做的事情。考虑pyparsing改用。它进行递归下降。对于此任务，您可以使用：

from pyparsing import *
import string, re

RawWord = Word(re.sub('[()" ]', '', string.printable))
Token = Forward()
Token << ( RawWord | 
           Group('"' + OneOrMore(RawWord) + '"') |
           Group('(' + OneOrMore(Token) + ')') )
Phrase = ZeroOrMore(Token)

Phrase.parseString(s, parseAll=True)

这对奇怪的空白很健壮，并且可以处理嵌套的括号。它也比大型正则表达式更具可读性，因此更容易调整。

我意识到您早就解决了您的问题，但这是此类问题的谷歌排名最高的页面之一，而 pyparsing 是一个鲜为人知的库。

score 1 · Accepted Answer

你的问题没有很好的定义。

您对规则的描述是

换句话说，如果它是一个由空格分隔的单词，则将其与下一个单词分开，如果它在括号中，则将括号中的整个单词组拆分并删除它们。逗号也是如此。

我想用逗号你的意思是倒逗号==引号。

然后有了这个

strs = "Hello (Test1 test2) (Hello1 hello2) other_stuff"

你应该明白

["Hello (Test1 test2) (Hello1 hello2) other_stuff"]

因为一切都被引号包围。最有可能的是，您希望不关心最大的引号。

我建议这个，虽然机器人很丑

import re, itertools
strs = raw_input("enter a string list ")

print [ y for y in list(itertools.chain(*[re.split(r'\"(.*)\"', x) 
        for x in re.split(r'\((.*)\)', strs)])) 
        if y <> '']

得到

>>> 
enter a string list here there (x y ) thereagain "there there"
['here there ', 'x y ', ' thereagain ', 'there there']

score 1 · Accepted Answer

这是做你所期望的

import re, itertools
strs = raw_input("enter a string list ")

res1 = [ y for y in list(itertools.chain(*[re.split(r'\"(.*)\"', x) 
        for x in re.split(r'\((.*)\)', strs)])) 
        if y <> '']

set1 = re.search(r'\"(.*)\"', strs).groups()
set2 = re.search(r'\((.*)\)', strs).groups()

print [k for k in res1 if k in list(set1) or k in list(set2) ] 
   + list(itertools.chain(*[k.split() for k in res1 if k 
   not in set1 and k not in set2 ]))

score 0 · Accepted Answer

对于蟒蛇 3.6 - 3.8

我有一个类似的问题，但是我不喜欢这些答案，可能是因为它们中的大多数来自 2013 年。所以我详细阐述了自己的解决方案。

regex = r'\(.+?\)|".+?"|\w+' 
test = 'Hello Test (Test1 test2) (Hello1 hello2) other_stuff'
result = re.findall(regex, test)

在这里，您正在寻找三个不同的组：

() 中包含的东西；括号应该和反斜杠一起写
包含在“”中的东西
只是言语
指某东西的用途？使您的搜索变得懒惰而不是贪婪

python - Python用括号分割字符串

5 回答 5

Related

Reference