python - CherryPy WebService 未将 NLTK 搭配返回到浏览器窗口

Question

我有一个非常简单的 CherryPy Web 服务，我希望它将成为一个更大项目的基础，但是，我需要让 NLTK 以我想要的方式工作。

我的 python 脚本导入 NLTK 并使用 NLTK 的搭配 (bigram) 功能，对预加载的数据进行一些分析。

我有一些问题：

1）为什么程序不将搭配返回到我的浏览器，而只返回到我的控制台？

2）为什么如果我指定from nltk.book import text4，程序会导入整套样本书（text1到text9）？

请记住，我是新手，所以答案可能就在我面前，但我看不到。

主要问题：如何将搭配结果传递给浏览器（webservice），而不是控制台？

谢谢

import cherrypy
import nltk
from nltk.book import text4

class BiGrams:
    def index(self):
        return text4.collocations(num=20)
    index.exposed = True

cherrypy.quickstart(BiGrams())

score 3 · Accepted Answer

我一直在与Moby Dick合作，前几天我偶然发现了仅导入一个特定文本的问题的答案：

>>>import nltk.corpus
>>>from nltk.text import Text
>>>moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))

因此，您真正需要的只是 fileid 以便将该文件的文本分配给您的新 Text 对象。不过要小心，因为只有“文学”资源位于 gutenberg.words 目录中。

无论如何，为了帮助查找古腾堡的文件 ID，在上面 import nltk.corpus 之后，您可以使用以下命令：

>>> nltk.corpus.gutenberg.fileids()

['austen-emma.txt', 'austen-persuasion.txt', 'austen-sense.txt', 'bible-kjv.txt',     'blake-poems.txt', 'bryant-stories.txt', 'burgess-busterbrown.txt', 'carroll-alice.txt',   'chesterton-ball.txt', 'chesterton-brown.txt', 'chesterton-thursday.txt', 'edgeworth-parents.txt', 'melville-moby_dick.txt', 'milton-paradise.txt', 'shakespeare-caesar.txt', 'shakespeare-hamlet.txt', 'shakespeare-macbeth.txt', 'whitman-leaves.txt']

但是，这仍然不能回答您的特定语料库的问题，即就职演说。对于这个答案，我找到了这篇 MIT 论文：http ://web.mit.edu/6.863/www/fall2012/nltk/ch2-3.pdf

（我向任何开始使用 nltk 文本的人推荐它，因为它谈到了抓取各种文本数据进行分析）。获得就职地址 fileids 的答案在第 6 页（稍作编辑）：

>>> nltk.corpus.inaugural.fileids()
['1789-Washington.txt', '1793-Washington.txt', '1797-Adams.txt', '1801-Jefferson.txt', '1805-Jefferson.txt', '1809-Madison.txt', '1813-Madison.txt', '1817-Monroe.txt', '1821-Monroe.txt', '1825-Adams.txt', '1829-Jackson.txt', '1833-Jackson.txt', '1837-VanBuren.txt', '1841-Harrison.txt', '1845-Polk.txt', '1849-Taylor.txt', '1853-Pierce.txt', '1857-Buchanan.txt', '1861-Lincoln.txt', '1865-Lincoln.txt', '1869-Grant.txt', '1873-Grant.txt', '1877-Hayes.txt', '1881-Garfield.txt', '1885-Cleveland.txt', '1889-Harrison.txt', '1893-Cleveland.txt', '1897-McKinley.txt', '1901-McKinley.txt', '1905-Roosevelt.txt', '1909-Taft.txt', '1913-Wilson.txt', '1917-Wilson.txt', '1921-Harding.txt', '1925-Coolidge.txt', '1929-Hoover.txt', '1933-Roosevelt.txt', '1937-Roosevelt.txt', '1941-Roosevelt.txt', '1945-Roosevelt.txt', '1949-Truman.txt', '1953-Eisenhower.txt', '1957-Eisenhower.txt', '1961-Kennedy.txt', '1965-Johnson.txt', '1969-Nixon.txt', '1973-Nixon.txt', '1977-Carter.txt', '1981-Reagan.txt', '1985-Reagan.txt', '1989-Bush.txt', '1993-Clinton.txt', '1997-Clinton.txt', '2001-Bush.txt', '2005-Bush.txt', '2009-Obama.txt']

因此，您应该能够将特定的就职地址作为文本导入（假设您在上面执行了“从 nltk.text 导入文本”），或者您可以使用上面导入的“就职”标识符来处理它们。例如，这有效：

>>>address1 = Text(nltk.corpus.inaugural.words('2009-Obama.txt'))

实际上，您可以通过不带任何参数调用 inaugural.words 将所有就职地址视为一个文档，如下面的示例所示：

>>>len(nltk.corpus.inaugural.words())

或者

地址=文本（nltk.corpus.inaugural.words（））

我记得一个月前在尝试自己回答这个问题时阅读了这个帖子，所以如果这些信息来晚了，也许会对某个地方的人有所帮助。

（这是我对 Stack Overflow 的第一个贡献。我已经阅读了几个月，直到现在还没有任何有用的东西可以添加。只想说一句“感谢大家的帮助。”）

score 1 · Accepted Answer

我的猜测是你从collocations()调用中得到的不是一个字符串，你需要序列化它。试试这个：

import cherrypy
import nltk
from nltk.book import text4
import simplejson

class BiGrams:
    def index(self):
        c = text4.collocations(num=20)
        return simplejson.dumps(c)
    index.exposed = True

cherrypy.quickstart(BiGrams())

score 0 · Accepted Answer

看看源代码（http://code.google.com/p/nltk/source/browse/trunk/nltk/），你会学到很多东西（我知道我做到了）。

1) Collocations 正在返回您的控制台，因为这是它应该做的。

help(text4.collocations)

会给你：

Help on method collocations in module nltk.text:

collocations(self, num=20, window_size=2) method of nltk.text.Text instance
    Print collocations derived from the text, ignoring stopwords.

    @seealso: L{find_collocations}
    @param num: The maximum number of collocations to print.
    @type num: C{int}
    @param window_size: The number of tokens spanned by a collocation (default=2)
    @type window_size: C{int}

浏览 text.py 中的源代码，您会发现搭配的方法非常简单。

2) 导入 nltk.book 会加载每个文本。您可以从 book.py 中获取您需要的位并编写一个仅加载就职地址的方法。

python - CherryPy WebService 未将 NLTK 搭配返回到浏览器窗口

3 回答 3

Related

Reference