我正在尝试从可以通过网络(例如 Safari)浏览的文章中下载文本。
错误是:
newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830
这是代码:
from newspaper import Article
from newspaper import Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_2) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/13.0.4 Safari/605.1.15'
config = Config()
config.browser_user_agent = user_agent
url = "https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830".strip()
page = Article(url, config=config)
page.download()
page.parse()
print(page.text)
就像您看到的那样,我尝试了此Stackoverflow 答案中的解决方案,但没有奏效。
完整的错误日志:
/Users/mona/anaconda3/bin/python /Users/mona/multimodal/newspaper_pg.py
Traceback (most recent call last):
File "/Users/mona/multimodal/newspaper_pg.py", line 18, in <module>
page.parse()
File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 191, in parse
self.throw_if_not_downloaded_verbose()
File "/Users/mona/anaconda3/lib/python3.6/site-packages/newspaper/article.py", line 532, in throw_if_not_downloaded_verbose
(self.download_exception_msg, self.url))
newspaper.article.ArticleException: Article `download()` failed with 403 Client Error: Forbidden for url: https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830 on URL https://www.newsweek.com/new-mexico-compound-charges-dropped-children-1096830
Process finished with exit code 1
我从这个网站获得了我的用户代理信息:https ://developers.whatismybrowser.com/useragents/explore/operating_system_name/macos/