您的代码几乎没有问题,我在一小时内添加了完整的解释。查看它的外观并同时查阅文档:
首先,使用with open()
子句打开文件更安全(参见https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects)
filepath = 'C:/Python27/bengali_wordlist_full.txt'
with open(filepath) as f:
content = f.read().decode('string-escape').decode("utf-8")
# do you really need all of this decdcoding?
现在content
保存文件中的文本:这是一个长字符串,带有'\n'
用于标记结束线的字符。我们可以将其拆分为行列表:
lines = content.splitlines()
并同时解析一行:
for line in lines:
try:
# split line into items, assign first to 'word', second to 'freq'
word, freq = line.split('\t') # assuming you have tab as separator
freq = float(freq) # we need to convert second item to numeric value from string
if 5 <= freq <= 300: # you can 'chain' comparisons like this
print word
except ValueError:
# this happens if split() gives more than two items or float() fails
print "Could not parse this line:", line
continue