我在码头集装箱中使用报纸3k。我下载了所有需要的 nltk 数据,但是当我运行article.nlp()
thenarticle.nlp()
和article.summary
.
当我在 Flask 应用程序中使用相同的代码时,它可以工作,现在我正在 Django (+ DRF) 上对其进行测试,但我遇到了这个错误:
web_1 | File "/usr/local/lib/python3.6/site-packages/newspaper/article.py", line 361, in nlp
web_1 | summary_sents = nlp.summarize(title=self.title, text=self.text, max_sents=max_sents)
web_1 | File "/usr/local/lib/python3.6/site-packages/newspaper/nlp.py", line 45, in summarize
web_1 | sentences = split_sentences(text)
web_1 | File "/usr/local/lib/python3.6/site-packages/newspaper/nlp.py", line 157, in split_sentences
web_1 | tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
web_1 | File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 752, in load
web_1 | opened_resource = _open(resource_url)
web_1 | File "/usr/local/lib/python3.6/site-packages/nltk/data.py", line 877, in _open
web_1 | return find(path_, path + [""]).open()
web_1 | TypeError: must be str, not list
似乎发现有问题tokenizers/punkt/english.pickle
,但是当我检查 nltk_data 时,它就在那里。
你有什么想法,这可能来自哪里?
更新:
代码非常简单。这是我的 Django 视图:
from newspaper import Article
article = Article(url, language=LANG)
article.download()
article.parse()
article.nlp() <---- The problem happens here most probably
article.summary
由于我使用的是 Django Rest 框架,因此我正在使用此字段进行序列化:
summary = serializers.CharField(max_length=5000, required=False)