我有如下所示的 txt 文件:
word, 23
Words, 2
test, 1
tests, 4
我希望它们看起来像这样:
word, 23
word, 2
test, 1
test, 4
我希望能够在 Python 中获取一个 txt 文件并将复数单词转换为单数。这是我的代码:
import nltk
f = raw_input("Please enter a filename: ")
def openfile(f):
with open(f,'r') as a:
a = a.read()
a = a.lower()
return a
def stem(a):
p = nltk.PorterStemmer()
[p.stem(word) for word in a]
return a
def returnfile(f, a):
with open(f,'w') as d:
d = d.write(a)
#d.close()
print openfile(f)
print stem(openfile(f))
print returnfile(f, stem(openfile(f)))
我也尝试了这两个定义而不是stem
定义:
def singular(a):
for line in a:
line = line[0]
line = str(line)
stemmer = nltk.PorterStemmer()
line = stemmer.stem(line)
return line
def stem(a):
for word in a:
for suffix in ['s']:
if word.endswith(suffix):
return word[:-len(suffix)]
return word
之后,我想取重复的单词(例如test
and test
)并通过将它们旁边的数字相加来合并它们。例如:
word, 25
test, 5
我不知道该怎么做。一个解决方案会很好,但不是必需的。