python - Python：以3列格式显示两个文件并计算相似词的频率

翻译自：https://stackoverflow.com/questions/69986176 2021-11-16T08:53:07.560

24 次

我正在尝试创建一个程序，我允许在命令行中指定两个文件，这些文件都应该是 3 列格式（只有第一列包含我需要的频率的单词，因为其他包含其他信息）。我需要获取两个文件之间所有共享单词的频率，并将它们添加到字典中。

这是我到目前为止所拥有的（前 21 行没有显示错误代码，所以我认为我还可以，问题是我尝试进一步分析它们的频率）：

import sys
from nltk.corpus import stopwords

count_dicts = []

## get the filenames from the command line
filename1 = sys.argv[1]
filename2 = sys.argv[2]

## open the first file for reading
infile1 = open(filename1, 'r')
## open the second file for reading
infile2 = open(filename2, 'r')

# initialize the counters
line_counter = diff_counter = 0

## for each line in file 1
for line1 in infile1:
    # also read a line from file 2
    line2 = infile2.readline()

#define frequency count function
def Count_Frequency(infile1, infile2):

    #Creating an empty dictionary
    freq1 = {}

    for item in infile1, infile2:
        if (item in freq1):
            freq1[item] += 1
        else:
            #freq1[item] = 1

    #print first frequency dictionary
    for key, value in freq1.items():
        print (key, value)

comb_freq = Counter(infile1, infile2)
print(comb_freq)

python - Python：以3列格式显示两个文件并计算相似词的频率

0 回答 0

Related

Reference