1

我的 nltk 数据是~/nltk_data/corpora/words/(en,en-basic,README)

根据__init__.pyinside ~/lib/python2.7/site-packages/nltk/corpus,要阅读布朗语料库中的单词列表,请使用 nltk.corpus.brown.words()

from nltk.corpus import brown
print brown.words()
['The', 'Fulton', 'County', 'Grand', 'Jury', 'said', ...]

__init__.py

words = LazyCorpusLoader(
    'words', WordListCorpusReader, r'(?!README|\.).*')
  1. 所以当我写的时候,我是在导入目录from nltk.corpus import words中的'words'函数 吗?__init__.pypython2.7/site-packages/nltk/corpus

  2. 还有为什么会这样:

     import nltk.corpus.words
     ImportError: No module named words
     from nltk.copus import words
     # WORKS FINE
    
  3. “棕色”语料库位于内部~/nltk_data/corpora(而不是在 nltk/corpus 中)。那么为什么这个命令有效呢?

    from nltk.corpus import brown
    

    不应该是这个吗?

    from nltk_data.corpora import brown
    
4

2 回答 2

2

回覆。第 2 点:您可以导入模块 ( import module.submodule) 或从模块 ( from module.submodule import variable) 导入对象。虽然您可以将模块视为变量,因为它实际上是该范围 ( from module import submodule) 中的变量,但它不能以其他方式工作。这就是为什么当您尝试做时import module.submodule.variable,它会失败。

回覆。第3点:取决于做什么nltk.corpus。也许它nltk_data会自动为您搜索/加载。

于 2013-08-27T13:20:52.210 回答
0

1.]是的 - 通过使用来自 util 的 LazyCorpusLoader,您可以在其中找到以下描述:

"""
    A proxy object which is used to stand in for a corpus object
    before the corpus is loaded.  This allows NLTK to create an object
    for each corpus, but defer the costs associated with loading those
    corpora until the first time that they're actually accessed.

    The first time this object is accessed in any way, it will load
    the corresponding corpus, and transform itself into that corpus
    (by modifying its own ``__class__`` and ``__dict__`` attributes).

    If the corpus can not be found, then accessing this object will
    raise an exception, displaying installation instructions for the
    NLTK data package.  Once they've properly installed the data
    package (or modified ``nltk.data.path`` to point to its location),
    they can then use the corpus object without restarting python.
    """

3.] nltk_data 是数据所在的文件夹,这并不意味着模块也在该文件夹中(数据是从nltk_data下载的)

NLTK 内置了对数十种语料库和训练模型的支持,如下所示。要在 NLTK 中使用这些,我们建议您使用 NLTK 语料库下载器,>>> nltk.download()

于 2013-08-27T13:00:09.183 回答