python - 为什么我不打印和图像

Question

这是我的代码：

import urllib2
from BeautifulSoup import BeautifulSoup


soup = BeautifulSoup(urllib2.urlopen('http://www.cbssports.com/nba/draft/mock-draft').read())

rows = soup.findAll("table", attrs = {'class': 'data borderTop'})[0].tbody.findAll("tr")[2:]

for row in rows:
    fields = row.findAll("td")
    if len(fields) >= 3:
        anchor = row.findAll("td")[1].find("a")
        if anchor:
            print anchor

它没有打印出图像，而是为我提供了图像在页面源中的位置。有什么原因吗？

score 0 · Accepted Answer

看起来您想要团队徽标缩略图？

import urllib2
import BeautifulSoup

url = 'http://www.cbssports.com/nba/draft/mock-draft'
txt = urllib2.urlopen(url).read()
bs = BeautifulSoup.BeautifulSoup(txt)

# get the main table
t = bs.findAll('table', attrs={'class': 'data borderTop'})[0]

# get the thumbnail urls
imgs = [im["src"] for im in t.findAll('img')] if "logos" in im["src"]]

imgs现在看起来像

[[u'http://sports.cbsimg.net/images/nba/logos/30x30/NO.png',
 u'http://sports.cbsimg.net/images/nba/logos/30x30/CHA.png',
 u'http://sports.cbsimg.net/images/nba/logos/30x30/WAS.png',
 u'http://sports.cbsimg.net/images/nba/logos/30x30/CLE.png',

等等。这些是每个徽标的文件位置，这是 HTML 实际包含的所有内容；如果你想要实际的图片，你必须单独获取每一张。

该列表包含对每个徽标的重复引用；删除重复项的最快方法是

imgs = list(set(imgs))

或者，该列表不包括每个团队；如果您有完整的团队名称缩写列表，则可以直接构建 logo-url 列表。

此外，查看该站点，每个 30x30 徽标都有一个相应的 90x90 徽标，您可能更喜欢它 - 更大更清晰。如果是这样，

imgs = [im.replace('30x30', '90x90') for im in imgs]

imgs现在看起来像

[u'http://sports.cbsimg.net/images/nba/logos/90x90/BOS.png',
 u'http://sports.cbsimg.net/images/nba/logos/90x90/CHA.png',
 u'http://sports.cbsimg.net/images/nba/logos/90x90/CLE.png',
 u'http://sports.cbsimg.net/images/nba/logos/90x90/DAL.png',

等等

现在，对于每个 url，我们下载图像并保存它：

import os

savedir = 'c:\\my documents\\logos'  # assumes this dir actually exists!
for im in imgs:
    fname = im.rsplit('/', 1)[1]
    fname = os.path.join(savedir, fname)
    with open(fname, 'wb') as outf:
        outf.write(urllib2.urlopen(im).read())

你有你的标志。

score 0 · Accepted Answer

根据 BeautifulSoup 文档，soup.findAll 返回标签列表或 NavigableStrings。所以你必须使用特定的方法，比如 content()。

访问http://www.crummy.com/software/BeautifulSoup/bs3/documentation.html在“Navigating the Parse Tree”副标题中找到您在这种情况下需要的内容。

python - 为什么我不打印和图像

2 回答 2

Related

Reference