python - 如何使用报纸从文本文件中的 URL 列表中提取报纸文章

Question

我正在尝试从文本文件中的多个 URL 下载/提取文章，然后想在 CSV 文件中提取相同的内容

我正在创建一个包含与特定主题相关的新闻的博客，我想使用 python 从文本文件中的一堆 URL 中提取新闻

from newspaper import Article
with open("untitled.txt") as url_file:
    lines = url_file.readlines()
    url = lines
for line in lines:
    article = Article(url)

# 我收到以下错误

AttributeError                            Traceback (most recent call last)
<ipython-input-47-ac8a2b1aab1a> in <module>
      1 for line in lines:
----> 2     article = Article(url)

~\Anaconda3\lib\site-packages\newspaper\article.py in __init__(self, url, title, source_url, config, **kwargs)
     58 
     59         if source_url == '':
---> 60             scheme = urls.get_scheme(url)
     61             if scheme is None:
     62                 scheme = 'http'

~\Anaconda3\lib\site-packages\newspaper\urls.py in get_scheme(abs_url, **kwargs)
    277     if abs_url is None:
    278         return None
--> 279     return urlparse(abs_url, **kwargs).scheme
    280 
    281 

~\Anaconda3\lib\urllib\parse.py in urlparse(url, scheme, allow_fragments)
    365     Note that we don't break the components up in smaller bits
    366     (e.g. netloc is a single string) and we don't expand % escapes."""
--> 367     url, scheme, _coerce_result = _coerce_args(url, scheme)
    368     splitresult = urlsplit(url, scheme, allow_fragments)
    369     scheme, netloc, url, query, fragment = splitresult

~\Anaconda3\lib\urllib\parse.py in _coerce_args(*args)
    121     if str_input:
    122         return args + (_noop,)
--> 123     return _decode_args(args) + (_encode_result,)
    124 
    125 # Result objects are more helpful than simple tuples

~\Anaconda3\lib\urllib\parse.py in _decode_args(args, encoding, errors)
    105 def _decode_args(args, encoding=_implicit_encoding,
    106                        errors=_implicit_errors):
--> 107     return tuple(x.decode(encoding, errors) if x else '' for x in args)
    108 
    109 def _coerce_args(*args):

~\Anaconda3\lib\urllib\parse.py in <genexpr>(.0)
    105 def _decode_args(args, encoding=_implicit_encoding,
    106                        errors=_implicit_errors):
--> 107     return tuple(x.decode(encoding, errors) if x else '' for x in args)
    108 
    109 def _coerce_args(*args):

AttributeError: 'list' object has no attribute 'decode'

我想复制这个过程，这样我就可以从数百个 URL 中提取文本。有没有办法设置它，所以我可以创建一个包含文章并提取文章的文本文件

根据我更新代码的建议更新 1 但是我仍然无法从 URL 中提取所有文章

from newspaper import Article
with open("untitled.txt") as url_file:
    lines = url_file.readlines()
for line in lines:
    article = Article(line)
article.download()
article.text

文章

我想从 URL 列表中提取所有文章。

python - 如何使用报纸从文本文件中的 URL 列表中提取报纸文章

# 我收到以下错误

0 回答 0

Related

Reference