python - 如何找到文本特征并打印它们？

Question

我刚刚开始使用自然语言工具包 (NLTK) 作为我工程学院项目的一部分。谁能告诉我如何阅读输入的段落文本和

1）将其分解为文本组件，即给定段落中的句子数、单词数、字符数以及多音节或复杂单词的数量

和

2）同时打印上述确定的值

score 0 · Accepted Answer

输入段落来自哪里？文件？安慰？这比 NLTK 更像是一个 python 问题。

其余的，请查看 nltk.tokenize 模块和 nltk.probability.FreqDist。

score 0 · Accepted Answer

从关于NLTK google group的讨论中：

import curses 
from curses.ascii import isdigit 
import nltk 
from nltk.corpus import cmudict

d = cmudict.dict() 

def nsyl(word): 
  return [len(list(y for y in x if isdigit(y[-1]))) for x in d[word.lower()]]

这应该能够为您提供每个单词的音节数。希望这可以帮助。

python - 如何找到文本特征并打印它们？

2 回答 2

Related

Reference