python - 如何将列表词典中的句子转换为纯文本以应用 NLTK

问问题 2021-11-09T21:02:14.517

23 次

我对 Python 和一切都是一个菜鸟。

我正在尝试将一些 NLTK 用于我的应用语言学论文。但是有些东西一直在阻止 nltk 工具在数据集上工作。

我尝试了一些复制+粘贴+修改样式的代码。但没有成功。我应该如何准备我的数据集以应用 nltk（例如，查找每个句子的标点符号百分比。计数/消除停用词等）。我已经在另一个数据集中应用了这些特性，它们只是文本，没有包含在任何这些“['']”中。

ds = {0: "['sentences I need to parse.']", 
      1: "['word1', 'word2', 'word3']",
      2: "['sentences and words']",
      3: "['Natural language processing.']",
      4: "['Further tokenization is needed.']",
      5: "['Is it a question?']",
      6: "['You\'re a real noob.']"}

我试图获得的输出是：

sentences I need to parse
word1, word2, word3
sentences and words
Natural language processing.
Further tokenization is needed.
Is it a question?
You\'re a real noob.

python - 如何将列表词典中的句子转换为纯文本以应用 NLTK

0 回答 0

Related

Reference