html - 使用python从网页下载文件

Question

我需要每 2 周从网页下载一个文件，但该文件每 2 周是一个新文件，因此名称也会更改，但它只更改最后 3 个字符，第一个“Vermeldung %%%”是相同的。之后，我需要通过电子邮件将其发送给某人，有人可以帮我完成吗？

这是我现在拥有的代码；

url ='https://worbis-kirche.de/downloads?view=document&id=339:vermeldungen-kw-9&catid=61'
from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
import requests

parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = requests.get(url)
http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
encoding = html_encoding or http_encoding
soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)
for link in soup.find_all('a', href=True):
    print(link['href'])

它为我提供了我需要的所有链接，但我如何告诉程序要下载哪个链接。需要下载的链接是/downloads?view=document&id=339&format=raw

score 0 · Accepted Answer

我认为您需要获取此链接：

https://worbis-kirche.de/downloads?view=document&id=339&format=raw

所以，你可以这样做：

import shutil
...
for link in soup.find_all('a', href=True):
    myLink = link['href'] # Assuming the myLink is /downloads?view=document&id=339&format=raw

myLink = "https://worbis-kirche.de" + myLink
r = requests.get(myLink, stream=True)  # To download it
r.raw.decode_content = True

with open(filename, "wb") as f:  # Filename is the name of pdf
    shutil.copyfileobj(r.raw, f)

try:
    shutil.move(os.getcwd() + "/" + filename, directory + filename) # Directory is your aimed (preferred) downloads folder
except Exception as e:
    print(e, ": File couldn\'t be transferred")

我希望我回答了你的问题...

html - 使用python从网页下载文件

1 回答 1

Related

Reference