2

可能重复:
如何使用请求下载图像

我有这个用于抓取 tumblr 博客的图像 URL 的 Python 脚本,并且想将它们下载到我桌面上的本地文件夹中。我将如何实施这个

import requests 
from bs4 import BeautifulSoup 

def make_soup(url):
#downloads a page with requests and creates a beautifulsoup object

    raw_page = requests.get(url).text
    soup = BeautifulSoup(raw_page)

    return soup


def get_images(soup):
#pulls images from the current page

    images = []

    foundimages = soup.find_all('img')

    for image in foundimages:
        url = img['src']

        if 'media.tumblr.com' in url:
            images.append(url)


    return images


def scrape_blog(url):
# scrapes the entire blog

    soup = make_soup(url)

    next_page = soup.find('a' id = 'nextpage')

    while next_page is not none:

        soup = make_soup(url + next_page['href'])
        next_page = soup.find('a' id = 'nextpage')

        more_images = get_images(soup)
        images.extend(more_images)

    return images


url = 'http://x.tumblr.com'
images = scrape_blog(url)
4

1 回答 1

1

Python 的“ urllib2 ”可能是您正在寻找的。如果您需要做任何复杂的事情(例如使用 cookie 或身份验证),可能值得研究一个包装库,例如Requests,它为标准库的许多更繁琐的功能提供了一个很好的包装器。

于 2013-01-01T00:35:23.990 回答