我一直在尝试使用具有|
分隔符和\n
换行符的数据集。
a | b | c
c | e | f
我一直在尝试拆分集合rec[0].split('|')
并应用nltk.FreqDist(rec)
这是我的源代码
import nltk
import csv
from nltk.util import ngrams
with open('CG_Attribute.csv', 'r') as f:
for row in f:
splitSet = row.split('|')
for rec in splitSet:
# token = nltk.word_tokenize(rec)
result = nltk.FreqDist(rec)
print(result)
我得到的输出如下
<FreqDist with 14 samples and 22 outcomes>
<FreqDist with 8 samples and 9 outcomes>
<FreqDist with 1 samples and 1 outcomes>
<FreqDist with 26 samples and 44 outcomes>
<FreqDist with 6 samples and 8 outcomes>
我期待的是
[('a',1),('b',1),('c',2),('e',1),('f',1)]
谁能指出我在哪里搞砸了?任何建议都会有所帮助:)
PS - 我什至用过csv
,但没有运气