python - 使用python下载文件的问题

Question

我正在尝试从该站点下载一些 jpg 并将它们保存在我的硬盘驱动器上，但是当我这样做时，由于格式问题，我无法打开文件，所有这些文件由于某种原因也有 115kb。

我尝试更改块大小并使用 request() 进行了一些操作，但没有成功。外壳中没有错误。该网站的链接是正确的。

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
    print('no images')
else:
    for i in range(len(elem)):
        link = elem[i].get('src')
        if link != None:
            plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
            for chunk in res.iter_content(100000):
                plik.write(chunk)
            plik.close()
            print('downloaded %s' % os.path.basename(link))

解决方案（在 'for i...' 循环中）：

url = 'http://<site>'
os.makedirs('photos', exist_ok = True)
res = requests.get(url)
res.raise_for_status()
soup = bs4.BeautifulSoup(res.text, features="html.parser")
elem = soup.select('a img')
if elem == []:
    print('no images')
else:
    for i in range(len(elem)):
        link = url + elem[i].get('src')
        res2 = requests.get(link)
        res2.raise_for_status()
        if link != None:
            plik = open(os.path.join('photos', os.path.basename(link)), 'wb')
            for chunk in res.iter_content(100000):
                plik.write(chunk)
            plik.close()
            print('downloaded %s' % os.path.basename(link))

score 0 · Accepted Answer

在读取 html 页面响应并提取图像的 src 后，您将不得不使用它来发出另一个 http(s) 请求以从该 url 流式传输图像。

目前，您似乎正在尝试从最初的回复中继续阅读。

注意：对于所有链接和锚点，浏览器会发出进一步的 http 请求

python - 使用python下载文件的问题

1 回答 1

Related

Reference