python - 得到一个中频单词

Question

我有一个包含数字、英语单词和孟加拉语单词的单词列表，在另一列中我有它们的频率。这些列没有标题。我需要频率在 5-300 之间的单词。这是我正在使用的代码。它不工作。

wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")

for word in wordlist:
    if word[1] >= 3
        print(word[0])
    elif word[1] <= 300
        print(word[0])

这给了我一个语法错误。

File "<stdin>", line 2
    if word[1] >= 3
              ^
SyntaxError: invalid syntax

有人可以帮忙吗？

score 2 · Accepted Answer

您应该在语句:之后添加if以修复此 SyntaxError：

wordlist = open('C:\\Python27\\bengali_wordlist_full.txt', 'r').read().decode('string-escape').decode("utf-8")

for word in wordlist:
    if word[1] >= 3:
        print word[0]
    elif word[1] <= 300:
        print word[0]

阅读： https ://docs.python.org/2/tutorial/controlflow.html

这里还有一个有用的提示：当 python 为某些行提供 SyntaxError 时，请始终查看前一行，然后查看下一行。

score 1 · Accepted Answer

您的代码几乎没有问题，我在一小时内添加了完整的解释。查看它的外观并同时查阅文档：

首先，使用with open()子句打开文件更安全（参见https://docs.python.org/2/tutorial/inputoutput.html#methods-of-file-objects）

filepath = 'C:/Python27/bengali_wordlist_full.txt'

with open(filepath) as f:
    content = f.read().decode('string-escape').decode("utf-8") 
    # do you really need all of this decdcoding?

现在content保存文件中的文本：这是一个长字符串，带有'\n'用于标记结束线的字符。我们可以将其拆分为行列表：

lines = content.splitlines()

并同时解析一行：

for line in lines:
    try:
        # split line into items, assign first to 'word', second to 'freq'
        word, freq = line.split('\t') # assuming you have tab as separator
        freq = float(freq) # we need to convert second item to numeric value from string
        if 5 <= freq <= 300: # you can 'chain' comparisons like this
            print word
    except ValueError: 
        # this  happens if split() gives more than two items or float() fails
        print "Could not parse this line:", line
        continue

python - 得到一个中频单词

2 回答 2

Related

Reference