python - 用于检测添加的 Python 正则表达式

Question

我有一个可能是这样的字符串：

50W
800W+25W
30W+50W+2W

我想检查当前字符串是否匹配并提取这些数值。

实际上，我已经这样做了：

re.compile("^(\d+W\+)*(\d+W)$")

问题是，如果我使用星 *，它总是得到第一组的第一个元素（例如，50W+20W+30W我得到["50W+", "30W"]（我正在使用re.findall）

我不知道如何获取所有组以及如何直接从正则表达式中去除“W”和“+”字符（也许我应该使用re.split？）。

编辑1-我不知道字符串是否是这样的：我必须先检查，然后才能提取数字

score 2 · Accepted Answer

不要使用正则表达式来提取这些值。

In [1]: [int(e[:-1]) for e in "30W+50W+2W".split('+')]
Out[1]: [30, 50, 2]

In [2]: [int(e[:-1]) for e in "800W+25W".split('+')]
Out[2]: [800, 25]

In [3]: [int(e[:-1]) for e in "50W".split('+')]
Out[3]: [50]

您可能想使用正则表达式来检查字符串是否与此模式匹配，但我们对您的情况知之甚少。

您可以使用积极的前瞻来搜索后跟的数字W：

In [16]: re.findall('\d+(?=W)', '30W+50W+2W')
Out[16]: ['30', '50', '2']

In [17]: re.findall('\d+(?=W)', '30W+50W')
Out[17]: ['30', '50']

In [18]: re.findall('\d+(?=W)', '30W')
Out[18]: ['30']

您无法检查字符串是否严格^(\d+W\+)*(\d+W)$ 并提取这些数字。

score 2 · Accepted Answer

在我看来，使用split()是一种更好的方法。

In [1]: '50W'.split('+')
Out[1]: ['50W']

In [2]: '800W+25W'.split('+')
Out[2]: ['800W', '25W']

In [3]: '30W+50W+2W'.split('+')
Out[3]: ['30W', '50W', '2W']

如果W要从每个列表条目中删除字符，只需使用切片并将结果字符串转换为整数：

In [4]: int('30W'[:-1])
Out[4]: 30

为了检查一个字符串是否是这种格式，你可以使用这个简单的正则表达式：

In [5]: pattern = re.compile(r'^\d+W(?:\+\d+W)*$')

总而言之，我会这样做：

In [6]: vals = ['50W', '800W+25W', '30W+50W+2W', '80W3000W2675W']

In [7]: for val in vals:
  ....:     if pattern.match(val):
  ....:         numbers = val.split('+')
  ....:         print [int(num[:-1]) for num in numbers]
[50]
[800, 25]
[30, 50, 2]

score 0 · Accepted Answer

如果你喜欢 python 中的函数式编程，你可以使用

>>> newlist = ['50W', '800W+25W', '30W+50W+2W', '80W3000W2675W']
>>> map(lambda x: re.findall(r"(\d+)W", x), 
        filter(lambda x: re.match('^\d+W(\+\d+W)*$',x), newlist))
[['50'], ['800', '25'], ['30', '50', '2']]

python - 用于检测添加的 Python 正则表达式

3 回答 3

Related

Reference