89

我正在使用python脚本运行文本文件中的行。我想在文本文档中搜索img标签并将标签作为文本返回。

当我运行正则表达式re.match(line)时,它返回一个 _sre.SRE_MATCH对象。如何让它返回一个字符串?

import sys
import string
import re

f = open("sample.txt", 'r' )
l = open('writetest.txt', 'w')

count = 1

for line in f:
    line = line.rstrip()
    imgtag  = re.match(r'<img.*?>',line)
    print("yo it's a {}".format(imgtag))

运行时打印:

yo it's a None
yo it's a None
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a None
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e578>
yo it's a <_sre.SRE_Match object at 0x7fd4ea90e5e0>
yo it's a None
yo it's a None
4

4 回答 4

125

你应该使用re.MatchObject.group(0). 喜欢

imtag = re.match(r'<img.*?>', line).group(0)

编辑:

你也可能会更好地做类似的事情

imgtag  = re.match(r'<img.*?>',line)
if imtag:
    print("yo it's a {}".format(imgtag.group(0)))

消除所有的Nones。

于 2013-08-28T16:44:26.787 回答
10

imgtag.group(0)imgtag.group()。这会将整个匹配项作为字符串返回。您也没有捕获任何其他内容。

http://docs.python.org/release/2.5.2/lib/match-objects.html

于 2013-08-28T16:45:20.613 回答
9

请注意,re.match(pattern, string, flags=0)仅返回字符串开头的匹配项。如果您想在字符串中的任何位置re.search(pattern, string, flags=0)找到匹配项,请改用 ( https://docs.python.org/3/library/re.html )。这将扫描字符串并返回第一个匹配对象。match_object.group(0)然后,您可以按照人们的建议提取匹配的字符串。

于 2017-04-24T08:09:27.137 回答
8

Considering there might be several img tags I would recommend re.findall:

import re

with open("sample.txt", 'r') as f_in, open('writetest.txt', 'w') as f_out:
    for line in f_in:
        for img in re.findall('<img[^>]+>', line):
            print >> f_out, "yo it's a {}".format(img)
于 2013-08-28T17:01:01.420 回答