0

我需要每 2 周从网页下载一个文件,但该文件每 2 周是一个新文件,因此名称也会更改,但它只更改最后 3 个字符,第一个“Vermeldung %%%”是相同的。之后,我需要通过电子邮件将其发送给某人,有人可以帮我完成吗?

这是我现在拥有的代码;

url ='https://worbis-kirche.de/downloads?view=document&id=339:vermeldungen-kw-9&catid=61'
from bs4 import BeautifulSoup
from bs4.dammit import EncodingDetector
import requests

parser = 'html.parser'  # or 'lxml' (preferred) or 'html5lib', if installed
resp = requests.get(url)
http_encoding = resp.encoding if 'charset' in resp.headers.get('content-type', '').lower() else None
html_encoding = EncodingDetector.find_declared_encoding(resp.content, is_html=True)
encoding = html_encoding or http_encoding
soup = BeautifulSoup(resp.content, parser, from_encoding=encoding)
for link in soup.find_all('a', href=True):
    print(link['href'])

它为我提供了我需要的所有链接,但我如何告诉程序要下载哪个链接。需要下载的链接是/downloads?view=document&id=339&format=raw

4

1 回答 1

0

我认为您需要获取此链接:

https://worbis-kirche.de/downloads?view=document&id=339&format=raw

所以,你可以这样做:

import shutil
...
for link in soup.find_all('a', href=True):
    myLink = link['href'] # Assuming the myLink is /downloads?view=document&id=339&format=raw

myLink = "https://worbis-kirche.de" + myLink
r = requests.get(myLink, stream=True)  # To download it
r.raw.decode_content = True

with open(filename, "wb") as f:  # Filename is the name of pdf
    shutil.copyfileobj(r.raw, f)

try:
    shutil.move(os.getcwd() + "/" + filename, directory + filename) # Directory is your aimed (preferred) downloads folder
except Exception as e:
    print(e, ": File couldn\'t be transferred")

我希望我回答了你的问题...

于 2021-01-19T12:57:46.213 回答