1

我在尝试执行一些 python 代码时收到一个我不明白的错误。我正在尝试通过优秀的 NLTK 教科书学习使用自然语言工具包。在尝试以下代码(我自己的数据对图 2.1 的修改)时,我收到了以下错误。

我运行的代码:

import os, re, csv, string, operator
import nltk
from nltk.corpus import PlaintextCorpusReader
dir = '/Dropbox/hearings'

corpus_root = dir
text = PlaintextCorpusReader(corpus_root, ".*")

cfd = nltk.ConditionalFreqDist(
    (target, fileid[:3])
     for fileid in text.fileids()
     for w in text.words(fileid)
     for target in ['budget','appropriat']
     if w.lower().startswith(target))

cfd.plot()

我收到的错误(完整追溯):

In [6]: ---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-6-abc9ff8cb2f1> in <module>()
----> 1 execfile(r'/Dropbox/hearings/hearings_ingest.py') # PYTHON-MODE

/Dropbox/hearings/hearings_ingest.py in <module>()
     14 cfd = nltk.ConditionalFreqDist(
     15     (target, fileid[:3])
---> 16      for fileid in text.fileids()
     17      for w in text.words(fileid)
     18      for target in ['budget','appropriat']

/Users/ian/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/nltk/probability.pyc in __init__(self, cond_samples)
   1727         defaultdict.__init__(self, FreqDist)
   1728         if cond_samples:
-> 1729             for (cond, sample) in cond_samples:
   1730                 self[cond].inc(sample)
   1731 

/Dropbox/hearings/hearings_ingest.py in <genexpr>((fileid,))
     15     (target, fileid[:3])
     16      for fileid in text.fileids()
---> 17      for w in text.words(fileid)
     18      for target in ['budget','appropriat']
     19      if w.lower().startswith(target))

/Users/ian/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/nltk/corpus/reader/util.pyc in iterate_from(self, start_tok)
    341 
    342         # If we reach this point, then we should know our length.
--> 343         assert self._len is not None
    344 
    345     # Use concat for these, so we can use a ConcatenatedCorpusView

AssertionError: 

In [7]: 

我包含新的 IPython 行以表明这是完整的错误。(在阅读其他问题时,我看到“AssertionError:”后面通常带有更多信息。在我的错误中,它是空白的。)

对于理解我的代码中的错误,我将不胜感激!谢谢!

4

1 回答 1

1

我可以通过创建一个空文件来重现错误foo,然后调用text.words('foo')

In [18]: !touch 'foo'

In [19]: text = corpus.PlaintextCorpusReader('.', "foo")

In [20]: text.words('foo')
AssertionError:

所以为了避免空文件,你可以这样做:

cfd = nltk.ConditionalFreqDist(
    (target, fileid[:3])
    for fileid in text.fileids()
    if os.path.getsize(fileid) > 0   # check the filesize is not 0
    for w in text.words(fileid)
    for target in ['budget', 'appropriat']
    if w.lower().startswith(target))
于 2013-07-08T17:39:23.770 回答