python - 如何在没有冗余输出的情况下在 Python 中拆分字符串

Question

我一直在尝试使用正则表达式作为分隔符来拆分字符串，但的输出string.split似乎包含一些冗余结果。

import re;
replaceArray = '((Replace the string)|((in|inside|within) the string)|(with the string))'
stringToSplit = '(Replace the string arr1 in the array arr2 with the array arr3)'
print(re.split(replaceArray, stringToSplit))

我希望拆分字符串看起来像这样，没有任何重叠的结果：

['Replace the string', ' arr1 ', 'in the string', ' arr2 ', 'with the string', ' arr3']

但是，拆分字符串数组包含一些冗余结果，这些结果似乎与其他匹配的字符串重叠：

['', 'Replace the string', 'Replace the string', None, None, None, ' arr1 ', 'in the string', None, 'in the string', 'in', None, ' arr2 ', 'with the string', None, None, None, 'with the string', ' arr3']

有什么方法可以防止这些冗余和重叠的结果包含在输出中string.split？

score 2 · Accepted Answer

如果您的正则表达式中有捕获组，则结果re.split()将包括那些捕获组。添加?:到所有组的开头以使它们不被捕获。其中几个组实际上并不是必需的，请尝试以下操作：

replaceArray = 'Replace the string|(?:in|inside|within) the string|with the string'

score 1 · Accepted Answer

从上的文档re.split：

如果在模式中使用捕获括号，则模式中所有组的文本也会作为结果列表的一部分返回。

我认为您想在您的正则表达式中使用非捕获组。也就是说，不是使用，而是(...)使用(?:...)

score 1 · Accepted Answer

前面的匹配组?:是非捕获组，不会出现在输出中。此外，您可能不想在re.split此处使用，re.match而是-您对拆分字符串并不真正感兴趣，而是想从中提取这些组。

>>> expr = '\((Replace the array (.*?)) ((?:in|inside|within) the array (.*?)) (with the array (.*?))\)'
>>> re.match(expr, stringToSplit).groups()
('Replace the array arr1', 'arr1', 'in the array arr2', 'arr2', 'with the array arr3', 'arr3')

或者

>>> expr = '\((Replace the array) (.*?) ((?:in|inside|within) the array) (.*?) (with the array) (.*?)\)'
>>> re.match(expr, stringToSplit).groups()
('Replace the array', 'arr1', 'in the array', 'arr2', 'with the array', 'arr3')

python - 如何在没有冗余输出的情况下在 Python 中拆分字符串

3 回答 3

Related

Reference