我想从以下页面下载图像http://wordpandit.com/learning-bin/visual-vocabulary/page/2/ 我使用 urllib 下载它并使用 BeautifulSoup 解析。它包含许多 url,我只想要那些以 .jpg 结尾的 url,它们也有 rel="prettyPhoto[gallery]" 标签。如何使用 Beautifulsoup 做到这一点?例如链接http://wordpandit.com/wp-content/uploads/2013/02/Obliterate.jpg
#http://wordpandit.com/learning-bin/visual-vocabulary/page/2/
import urllib
import BeautifulSoup
import lxml
baseurl='http://wordpandit.com/learning-bin/visual-vocabulary/page/'
count=2
for count in range(1,2):
url=baseurl+count+'/'
soup1=BeautifulSoup.BeautifulSoup(urllib2.urlopen(url))#read will not be needed
#find all links to imgs
atag=soup.findAll(rel="prettyPhoto[gallery]")
for tag in atag:
soup2=BeautifulSoup.BeautifulSoup(tag)
imgurl=soup2.find(href).value
urllib2.urlopen(imgurl)