python - 三个围绕下划线的python正则表达式

Question

我正在使用支持 python 正则表达式语法的应用程序帮助某人在工作中重命名一些文件。我尝试了一些在论坛上找到的表达式，例如^[^_]+(?=_)下面的 a)，但它不能正常工作，还有一些不能正常工作。所以，我想我应该联系一个真正知道他们在做什么的人。谢谢你的帮助。

a）在第一个表达式中，我必须在第一个下划线之前找到所有字符，如下所示：

cannon_mac_23567_prsln_333
jones_james_343342_prsln_333
smith_john_223462_prsln_333

所以，我必须得到大炮、琼斯和史密斯

b）在一个单独的表达式中，我必须找到第一个和第二个下划线之间的所有字符。所以，我需要在上面的示例中找到 mac、james 和 john。

c) 在最后一个表达式中，我必须找到第一个下划线

重命名应用程序的工作方式我必须分三个部分执行这些正则表达式，就像上面一样。谢谢。

score 3 · Accepted Answer

好吧，您可以完全不使用正则表达式，因为您知道分隔符是下划线。

使用str.split, 和index方法。

'smith_john_223462_prsln_333'.split('_')[0] //(to extract smith)
'smith_john_223462_prsln_333'.split('_')[1] //(to extract john)
'smith_john_223462_prsln_333'.index('_') //(to get position of first underscore)

score 1 · Accepted Answer

我会使用：

1.  ^([^_]+)_
2.  _([^_]+)_ 
3.  ^[^_]_

使用 re.match，因为它匹配字符串的开头。

[编辑：正如 Cthulhu 指出的那样，你最好不要为此使用正则表达式，因为使用字符串方法更快更容易]

score 1 · Accepted Answer

对，我一开始误解了你的问题。虽然str.split这肯定是解决这个问题的一种更优雅的方法，但这里有三个正则表达式可以满足您的需求。我不知道你的这个应用程序是否适用于他们。所以把这个和一粒盐一起吃。

请查看re库和MatchObject.span()以获取更多信息。

作为单个正则表达式：

import re
line = "cannon_mac_23567_prsln_333"
In [1812]: match = re.match(r"(.+?)(\_)(.+?)\_", line)

In [1813]: match.groups()
Out[1813]: ('cannon', '_', 'mac')

In [1814]: match.span(2)[0] <-- second group, start. The first occurence of _
Out[1814]: 6

In [1815]: line[6]
Out[1815]: '_'

分隔在 a、b、c 中：

A：

import re
line = "cannon_mac_23567_prsln_333"
In [1707]: match = re.match(r"(.+?)\_", line)

In [1708]: match.groups()
Out[1708]: ('cannon',)

乙：

In [1712]: match = re.match(r".+\_(.+?)\_", line)

In [1713]: match.groups()
Out[1713]: ('prsln',)

c：为了简单起见，最后一个使用 re.search。MatchObject.span()返回一个位置元组(start, end)

In [1763]: match = re.search("\_", line)

In [1764]: match.span()[0]
Out[1764]: 6

In [1765]: line[6]
Out[1765]: '_'

python - 三个围绕下划线的python正则表达式

3 回答 3

Related

Reference