python - 如何从抓取的 URL 列表中下载图像？

Question

可能重复：
如何使用请求下载图像

我有这个用于抓取 tumblr 博客的图像 URL 的 Python 脚本，并且想将它们下载到我桌面上的本地文件夹中。我将如何实施这个

import requests 
from bs4 import BeautifulSoup 

def make_soup(url):
#downloads a page with requests and creates a beautifulsoup object

    raw_page = requests.get(url).text
    soup = BeautifulSoup(raw_page)

    return soup


def get_images(soup):
#pulls images from the current page

    images = []

    foundimages = soup.find_all('img')

    for image in foundimages:
        url = img['src']

        if 'media.tumblr.com' in url:
            images.append(url)


    return images


def scrape_blog(url):
# scrapes the entire blog

    soup = make_soup(url)

    next_page = soup.find('a' id = 'nextpage')

    while next_page is not none:

        soup = make_soup(url + next_page['href'])
        next_page = soup.find('a' id = 'nextpage')

        more_images = get_images(soup)
        images.extend(more_images)

    return images


url = 'http://x.tumblr.com'
images = scrape_blog(url)

score 1 · Accepted Answer

Python 的“ urllib2 ”可能是您正在寻找的。如果您需要做任何复杂的事情（例如使用 cookie 或身份验证），可能值得研究一个包装库，例如Requests，它为标准库的许多更繁琐的功能提供了一个很好的包装器。

python - 如何从抓取的 URL 列表中下载图像？

1 回答 1

Related

Reference