0

我在码头集装箱中使用报纸3k。我下载了所有需要的 nltk 数据,但是当我运行article.nlp()thenarticle.nlp()article.summary.

当我在 Flask 应用程序中使用相同的代码时,它可以工作,现在我正在 Django (+ DRF) 上对其进行测试,但我遇到了这个错误:

web_1  |   File "/usr/local/lib/python3.6/site-packages/newspaper/article.py", line 361, in nlp
web_1  |     summary_sents = nlp.summarize(title=self.title, text=self.text, max_sents=max_sents)
web_1  |   File "/usr/local/lib/python3.6/site-packages/newspaper/nlp.py", line 45, in summarize
web_1  |     sentences = split_sentences(text)
web_1  |   File "/usr/local/lib/python3.6/site-packages/newspaper/nlp.py", line 157, in split_sentences
web_1  |     tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
web_1  |   File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 752, in load
web_1  |     opened_resource = _open(resource_url)
web_1  |   File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 877, in _open
web_1  |     return find(path_, path + [""]).open()
web_1  | TypeError: must be str, not list

似乎发现有问题tokenizers/punkt/english.pickle,但是当我检查 nltk_data 时,它就在那里。

你有什么想法,这可能来自哪里?

更新

代码非常简单。这是我的 Django 视图:

from newspaper import Article

article = Article(url, language=LANG)
article.download()
article.parse()
article.nlp() <---- The problem happens here most probably
article.summary

由于我使用的是 Django Rest 框架,因此我正在使用此字段进行序列化:

summary = serializers.CharField(max_length=5000, required=False)   
4

0 回答 0