python - 如何使用 Python 正则表达式获取 Image src？

Question

如何使用正则表达式使用 Python 从以下 html 字符串中获取图像的 src

<td width="80" align="center" valign="top"><a href="http://news.google.com/news/url?sa=t&fd=R&usg=AFQjCNFqz8ZCIf6NjgPPiTd2LIrByKYLWA&url=http://www.news.com.au/business/spain-victory-faces-market-test/story-fn7mjon9-1226390697278"><img src="//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg" alt="" border="1" width="80" height="80" /> NEWS.com.au</a></td>

我试着用

matches = re.search('@src="([^"]+)"',text)
print(matches[0])

但一无所获

score 9 · Accepted Answer

您可以考虑使用BeautifulSoup代替正则表达式：

>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(junk)
>>> soup.findAll('img')
[<img src="//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg" alt="" border="1" width="80" height="80" />]
>>> soup.findAll('img')[0]['src']
u'//nt3.ggpht.com/news/tbn/380jt5xHH6l_FM/6.jpg'

score 6 · Accepted Answer

6

只需在正则表达式中丢失 @ 即可

于 2012-06-10T20:26:00.383 回答

score -1 · Accepted Answer

-1

你可以简化re一下：

match = re.search(r'src="(.*?)"', text)

于 2012-06-10T20:30:07.217 回答

python - 如何使用 Python 正则表达式获取 Image src？

3 回答 3

Related

Reference