我正在尝试创建一个程序,我允许在命令行中指定两个文件,这些文件都应该是 3 列格式(只有第一列包含我需要的频率的单词,因为其他包含其他信息)。我需要获取两个文件之间所有共享单词的频率,并将它们添加到字典中。
这是我到目前为止所拥有的(前 21 行没有显示错误代码,所以我认为我还可以,问题是我尝试进一步分析它们的频率):
import sys
from nltk.corpus import stopwords
count_dicts = []
## get the filenames from the command line
filename1 = sys.argv[1]
filename2 = sys.argv[2]
## open the first file for reading
infile1 = open(filename1, 'r')
## open the second file for reading
infile2 = open(filename2, 'r')
# initialize the counters
line_counter = diff_counter = 0
## for each line in file 1
for line1 in infile1:
# also read a line from file 2
line2 = infile2.readline()
#define frequency count function
def Count_Frequency(infile1, infile2):
#Creating an empty dictionary
freq1 = {}
for item in infile1, infile2:
if (item in freq1):
freq1[item] += 1
else:
#freq1[item] = 1
#print first frequency dictionary
for key, value in freq1.items():
print (key, value)
comb_freq = Counter(infile1, infile2)
print(comb_freq)