7

我想匹配字符串的不同部分并将它们存储在单独的变量中以备后用。例如,

string = "bunch(oranges, bananas, apples)"
rxp = "[a-z]*\([var1]\, [var2]\, [var3]\)"

所以我有

var1 = "oranges"
var2 = "bananas"
var3 = "apples"

类似于 re.search() 所做的事情,但针对同一匹配的多个不同部分。

编辑:事先不知道列表中的水果数量。应该把这个和问题放在一起。

4

4 回答 4

4

就是re.search这样。只需使用捕获组(括号)来访问稍后与某些子模式匹配的内容:

>>> import re
>>> m = re.search(r"[a-z]*\(([a-z]*), ([a-z]*), ([a-z]*)\)", string)
>>> m.group(0)
'bunch(oranges, bananas, apples)'
>>> m.group(1)
'oranges'
>>> m.group(2)
'bananas'
>>> m.group(3)
'apples'

另请注意,我使用原始字符串来避免双反斜杠。

如果您内部的“变量”数量bunch可以变化,那么您就有问题了。大多数正则表达式引擎无法捕获可变数量的字符串。但是,在这种情况下,您可以摆脱这种情况:

>>> m = re.search(r"[a-z]*\(([a-z, ]*)\)", string)
>>> m.group(1)
'oranges, bananas, apples'
>>> m.group(1).split(', ')
['oranges', 'bananas', 'apples']
于 2012-11-18T21:19:39.503 回答
4

For regular expressions, you can use the match() function to do what you want, and use groups to get your results. Also, don't assign to the word string, as that is a built-in function (even though it's deprecated). For your example, if you know there are always the same number of fruits each time, it looks like this:

import re
input = "bunch(oranges, bananas, apples)"
var1, var2, var3 = re.match('bunch\((\w+), (\w+), (\w+)\)', input).group(1, 2, 3)

Here, I used the \w special sequence, which matches any alphanumeric character or underscore, as explained in the documentation

If you don't know the number of fruits in advance, you can use two regular expression calls, one to get extract the minimal part of the string where the fruits are listed, getting rid of "bunch" and the parentheses, then finditer to extract the names of the fruits:

import re
input = "bunch(oranges, bananas, apples)"
[m.group(0) for m in re.finditer('\w+(, )?', re.match('bunch\(([^)]*)\)', input).group(1))] 
于 2012-11-18T21:21:40.220 回答
4

If you want, you can use groupdict to store matching items in a dictionary:

regex = re.compile("[a-z]*\((?P<var1>.*)\, (?P<var2>.*)\, (?P<var3>.*)")
match = regex.match("bunch(oranges, bananas, apples)")
if match:
    match.groupdict()

#{'var1': 'oranges', 'var2': 'bananas', 'var3': 'apples)'}
于 2012-11-18T21:33:05.670 回答
1

Don't. Every time you use var1, var2 etc, you actually want a list. Unfortunately, this is no way to collect arbitrary number of subgroups in a list using findall, but you can use a hack like this:

import re
lst = []
re.sub(r'([a-z]+)(?=[^()]*\))', lambda m: lst.append(m.group(1)), string)
print lst # ['oranges', 'bananas', 'apples']

Note that this works not only for this specific example, but also for any number of substrings.

于 2012-11-18T21:22:31.297 回答