0

我试图通过一个返回 JSON 数据的 API 检索 10 个图像,方法是首先向 API 发出请求,然后将返回的 JSON 数据中的 10 个图像 URL 存储在一个列表中。在我最初的迭代中,我对这些 url 提出了单独的请求,并将响应内容保存到文件中。我的代码在下面给出了我的 API 密钥,原因很明显:

def get_image(search_term):

    number_images = 10
    images = requests.get("https://pixabay.com/api/?key=insertkey&q={}&per_page={}".format(search_term,number_images))
    images_json_dict = images.json()

    hits = images_json_dict["hits"]
    urls = []
    for i in range(len(hits)):
        urls.append(hits[i]["webformatURL"])

    count =0
    for url in urls:
        picture_request = requests.get(url)
        if picture_request.status_code == 200:
            try:
                with open(dir_path+r'\\images\\{}.jpg'.format(count),'wb') as f:
                    f.write(picture_request.content)
            except:
                    os.mkdir(dir_path+r'\\images\\')
                    with open(dir_path+r'\\images\\{}.jpg'.format(count),'wb') as f:
                        f.write(picture_request.content)
        count+=1

除了速度很慢之外,这一切都很好。提取这 10 张图像并保存在一个文件夹中可能需要 7 秒钟。我在这里读到可以在请求库中使用 Sessions() 来提高性能 - 我希望尽快获得这些图像。我已经修改了代码,如下所示,但是我遇到的问题是会话对象上的 get 请求返回 requests.sessions.Session 对象而不是响应代码,并且也没有 .content 方法来检索内容(我在下面的相关代码行中添加了注释)。我对编程比较陌生,所以我不确定这是否是最好的方法。我的问题是,既然我正在使用 Session() ,我该如何使用会话来检索图像内容,或者有什么更聪明的方法可以做到这一点?

def get_image(search_term):

    number_images = 10
    images = requests.get("https://pixabay.com/api/?key=insertkey&q={}&per_page={}".format(search_term,number_images))
    images_json_dict = images.json()

    hits = images_json_dict["hits"]
    urls = []
    for i in range(len(hits)):
        urls.append(hits[i]["webformatURL"])

    count =0
    #Now using Session()
    picture_request = requests.Session()
    for url in urls:
        picture_request.get(url)
        #This will no longer work as picture_request is an object
        if picture_request == 200:
            try:
                with open(dir_path+r'\\images\\{}.jpg'.format(count),'wb') as f:
                    #This will no longer work as there is no .content method
                    f.write(picture_request.content)
            except:
                    os.mkdir(dir_path+r'\\images\\')
                    with open(dir_path+r'\\images\\{}.jpg'.format(count),'wb') as f:
                        #This will no longer work as there is no .content method
                        f.write(picture_request.content)
        count+=1
4

1 回答 1

0

假设您想坚持使用requests库,那么您将需要threading用于创建多个并行实例。

concurrent.futures有方便的构造函数来创建多个线程with concurrent.futures.ThreadPoolExecutor

fetch()用于下载图像。 fetch_all()用于创建线程池,您可以通过传递threads参数来选择要运行的线程数。 get_urls()是您检索 url 列表的函数。你应该通过你的token(key) 和search_term.

注意:如果您的 Python 版本早于 3.7,则应将 f-strings ( f"{args}")替换为常规格式化函数( "{}".format(args))。

import os
import requests
from concurrent import futures


def fetch(url, session = None):
    if session:
        r = session.get(url, timeout = 60.)
    else:
        r = reqests.get(url, timeout = 60.)
    r.raise_for_status()

    return r.content


def fetch_all(urls, session = requests.session(), threads = 8):
    with futures.ThreadPoolExecutor(max_workers = threads) as executor:
        future_to_url = {executor.submit(fetch, url, session = session): url for url in urls}
        for future in futures.as_completed(future_to_url):
            url = future_to_url[future]
            if future.exception() is None:
                yield url, future.result()
            else:
                print(f"{url} generated an exception: {future.exception()}")
                yield url, None


def get_urls(search_term, number_images = 10, token = "", session = requests.session()):
    r = requests.get(f"https://pixabay.com/api/?key={token}&q={search_term}&per_page={number_images}")
    r.raise_for_status()
    urls = [hit["webformatURL"] for hit in r.json().get("hits", [])]

    return urls


if __name__ == "__main__":
    root_dir = os.getcwd()
    session = requests.session()
    urls = get_urls("term", token = "token", session = session)

    for url, content in fetch_all(urls, session = session):
        if content is not None:
            f_dir = os.path.join(root_dir, "images")
            if not os.path.isdir(f_dir):
                os.makedirs(f_dir)
            with open(os.path.join(f_dir, os.path.basename(url)), "wb") as f:
                f.write(content)

另外我推荐你看看aiohttp。我不会在这里提供示例,而是为您提供一篇类似任务的文章的链接,您可以在其中阅读更多相关信息。

于 2019-12-07T17:38:24.683 回答