这是一种快速而丑陋的方法,无需任何库:
"""
>>> get_src(data)
['http://www.askgamblers.com/cache/97299a130feb2e59a08a08817daf2c0e6825991f_begado-casino-logo-review1.jpg', 'http://feeds.feedburner.com/~r/AskgamblesCasinoNews/~4/SXhvCskjiYo']
"""
data = """<img src="http://www.askgamblers.com/cache/97299a130feb2e59a08a08817daf2c0e6825991f_begado-casino-logo-review1.jpg" /><br/>
Begado is the newest online casino in our listings. As the newest
member of the Affactive group, Begado features NuWorks slots and games
for both US and international players.
<img src="http://feeds.feedburner.com/~r/AskgamblesCasinoNews/~4/SXhvCskjiYo" height="1" width="1"/>"""
def get_src(lines):
srcs = []
for line in data.splitlines():
i = line.find('src=') + 5
f = line.find('"', i)
if i > 0 and f > 0:
srcs.append(line[i:f])
return srcs
但是我会推荐使用Beatiful Soup,它是一个非常好的库,旨在处理真实的网络(损坏的 HTML 和所有),或者如果您的数据是有效的 XML,您可以使用Python 标准库中的Element Tree。