python - python正则表达式替换

Question

我需要在类似于下面给出的大量字符串中找到“taxid”的值。对于此特定字符串，“taxid”值为“9606”。我需要丢弃其他所有东西。“taxid”可能出现在文本中的任何位置，但后面总是跟一个“:”，然后是数字。

score:0.86|taxid:9606(Human)|intact:EBI-999900

如何在python中为此编写正则表达式。

score 4 · Accepted Answer

>>> import re
>>> s = 'score:0.86|taxid:9606(Human)|intact:EBI-999900'
>>> re.search(r'taxid:(\d+)', s).group(1)
'9606'

If there are multiple taxids, use re.findall, which returns a list of all matches:

>>> re.findall(r'taxid:(\d+)', s)
['9606']

score 0 · Accepted Answer

0

for line in lines:
    match = re.match(".*\|taxid:([^|]+)\|.*",line)
    print match.groups()

于 2012-09-17T20:20:00.630 回答

2 回答 2