nlp - 最佳开源/免费 NLP 引擎

Question

假设我有一个众所周知的短语的拉（列表），例如：{“我爱你”，“你妈妈是……”，“我想我怀孕了”……}让我们说一下像这样的1000个。现在我希望用户在文本框中输入自由文本，并使用某种 NLP 引擎来消化文本并从 pull 中找到 10 个最相关的短语，这些短语可能在某种程度上与文本相关。

我认为最简单的实现可能是看单词。每次选择一个词并以某种方式寻找相似之处。不确定哪个？
最让我害怕的是我必须支持的词汇量。我是某种演示的单一开发人员，我不喜欢在表格中填写文字的想法......
我正在寻找一个免费的 NLP 引擎。我不知道它所用的语言，但它必须是免费的——而不是某种通过 API 调用收费的在线服务。

score 3 · Accepted Answer

3

似乎 TextBlob 和 ConeptNet 足以解决这个问题！

于 2013-09-17T10:17:07.617 回答

score 2 · Accepted Answer

TextBlob是一个易于使用的 Python NLP 库，它是免费和开源的（根据宽松的 MIT 许可证获得许可）。它为优秀的NLTK和模式库提供了一个很好的包装器。

解决问题的一种简单方法是从给定文本中提取名词短语。

这是TextBlob 文档中的一个示例。

from text.blob import TextBlob

text = '''
The titular threat of The Blob has always struck me as the ultimate movie
monster: an insatiably hungry, amoeba-like mass able to penetrate
virtually any safeguard, capable of--as a doomed doctor chillingly
describes it--"assimilating flesh on contact.
Snide comparisons to gelatin be damned, it's a concept with the most
devastating of potential consequences, not unlike the grey goo scenario
proposed by technological theorists fearful of
artificial intelligence run rampant.
'''

blob = TextBlob(text)
print(blob.noun_phrases)
# => ['titular threat', 'blob', 'ultimate movie monster', ...]

这可能是一个起点。从那里您可以尝试其他方法，例如评论或TF-IDF中提到的相似性方法。TextBlob 还可以轻松交换模型以进行名词短语提取。

全面披露：我是 TextBlob 的作者。

nlp - 最佳开源/免费 NLP 引擎

2 回答 2

Related

Reference