python - 在 NLTK 中查找两个文本语料库之间的共同词

Question

我对 NLTK 很陌生，并且正在尝试做一些事情。

在两个文本主体之间找到常用词的最佳方法是什么？基本上，我有一个长文本文件说 text1，另一个说 text2。我想使用 NLTK 查找出现在两个文件中的常用词。

有直接的方法吗？最好的方法是什么？

谢谢！

score 1 · Accepted Answer

在我看来，除非您需要在语言处理方面做一些特别的事情，否则您不需要 NLTK：

words1 = "This is a simple test of set intersection".lower().split()
words2 = "Intersection of sets is easy using Python".lower().split()

intersection = set(words1) & set(words2)

>>> set(['of', 'is', 'intersection'])

python - 在 NLTK 中查找两个文本语料库之间的共同词

1 回答 1

Related

Reference