使用 python 2.7 我试图从 NYT 抓取和导入文章,并且在同时获取一篇或多篇文章时没有问题,现在得到错误 AttributeError: 'module' object has no attribute 'Scraper'。
我正在使用报纸包,到目前为止它运行良好,直到出现此错误。尽管 html 链接是准确的,但它似乎适用于某些 html 链接而不适用于其他链接。关于解决方案的任何想法?
这是我的代码:
import pandas as pd
import newspaper
from newspaper import Article
url3='http://www.nytimes.com/2010/08/04/nyregion/04shooting.html'
url4='http://www.nytimes.com/2010/08/04/nyregion/04gunman.html'
url5='http://www.nytimes.com/2010/08/05/nyregion/05shooting.html'
url6='http://www.nytimes.com/2010/08/05/nyregion/05vics.html'
urls=[url3, url4,url5,url6]
Nyt_HBC =pd.DataFrame()
for i in urls:
a=Article(i, language='en')
a.download()
a.parse()
Nyt_HBC= Nyt_HBC.append([[a.title, a.text]], ignore_index=True)
Nyt_HBC.columns=['Title','Article']
Nyt_HBC
这是我的完整错误消息(快速说明,如果没有 .parse(),您将无法运行它)-
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-47-12545a6e9854> in <module>()
9 a=Article(i, language='en')
10 a.download()
---> 11 a.parse()
12 Nyt_HBC= Nyt_HBC.append([[a.title, a.text]], ignore_index=True)
13 Nyt_HBC.columns=['Title','Article']
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in parse(self)
226
227 if self.config.fetch_images:
--> 228 self.fetch_images()
229
230 self.is_parsed = True
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in fetch_images(self)
245 first_img = self.extractor.get_first_img_url(
246 self.url, self.clean_top_node)
--> 247 self.set_top_img(first_img)
248
249 if not self.has_top_image():
/Users/ThomasPLapinger/anaconda/lib/python2.7/site-packages/newspaper/article.pyc in set_top_img(self, src_url)
399 def set_top_img(self, src_url):
400 if src_url is not None:
--> 401 s = images.Scraper(self)
402 if s.satisfies_requirements(src_url):
403 self.set_top_img_no_check(src_url)
AttributeError: 'module' object has no attribute 'Scraper'