2

我是 Python 新手,尝试导入自己的语料库以在其文本中查找搭配。我正在使用 Python 3.7.5。并按照 Bird、Klein 和 Loper 教科书的指示进行操作。

但是,当我尝试在整个语料库上使用“collocation_list”时,环境返回“'ConcatenatedCorpusView' 对象没有属性'collocation_list'”,而当我在单独的文本上使用它时,它是“'StreamBackedCorpusView' 对象没有属性'搭配列表'”。

我应该怎么做才能在语料库文本中找到搭配?

我试图调用“import nltk.collocations”,但它没有用,当然......

>>> from nltk.corpus import PlaintextCorpusReader
>>> eng_corpus_root = 'D:\Corpus\EN'
>>> eng_corpus = PlaintextCorpusReader(eng_corpus_root, '.*')
>>> eng = eng_corpus.words()

>>> eng.collocation_list()
Traceback (most recent call last):
  File "<pyshell#39>", line 1, in <module>
    eng.collocation_list()
AttributeError: 'ConcatenatedCorpusView' object has no attribute 'collocation_list'

>>> eng1 = eng_corpus.words('CNN/2019.10.18_EN_CNN 2.txt')

>>> eng1.collocation_list()
Traceback (most recent call last):
  File "<pyshell#68>", line 1, in <module>
    eng1.collocation_list()
AttributeError: 'StreamBackedCorpusView' object has no attribute 'collocation_list'

如果我能得到这样的结果会很棒(上面提到的教科书的一个例子)。

>>> from nltk.book import *
*** Introductory Examples for the NLTK Book ***
Loading text1, ..., text9 and sent1, ..., sent9
Type the name of the text or sentence to view it.
Type: 'texts()' or 'sents()' to list the materials.
text1: Moby Dick by Herman Melville 1851
text2: Sense and Sensibility by Jane Austen 1811
text3: The Book of Genesis
text4: Inaugural Address Corpus
text5: Chat Corpus
text6: Monty Python and the Holy Grail
text7: Wall Street Journal
text8: Personals Corpus
text9: The Man Who Was Thursday by G . K . Chesterton 1908

>>> text4.collocation_list()
['United States', 'fellow citizens', 'four years', 'years ago', 'Federal Government', 'General Government', 'American people', 'Vice President', 'God bless', 'Chief Justice', 'Old World', 'Almighty God', 'Fellow citizens', 'Chief Magistrate', 'every citizen', 'one another', 'fellow Americans', 'Indian tribes', 'public debt', 'foreign nations']

将不胜感激任何帮助...

4

1 回答 1

1

问题解决了......我需要初始化我的语料库(见:http ://www.nltk.org/api/nltk.html#nltk.text.Text )

>>> from nltk.text import Text
>>> text458 = Text(eng_corpus.words())
>>> text458.collocation_list()
['Hong Kong', 'United States', 'Getty Images', 'European Union', 'Northern Ireland', 'Boris Johnson', 'Prime Minister', 'Islamic State', 'Extinction Rebellion', 'Cape Dorset', 'extradition bill', 'Recep Tayyip', 'HONG KONG', 'Mike Pence', 'New York', 'Tayyip Erdogan', 'Democratic Forces', 'Vice President', 'Anthony Kwan', 'Kurdish fighters']

就这么简单。

于 2019-10-29T15:59:31.563 回答