0
def get_word_count(wordlist, final):
    regex = []
    count = [[] for x in xrange(len(wordlist))]
    frequency = []
    regex = makeregex(wordlist)
    for i in range(len(final)-1):
        size = os.stat(final[i]).st_size
        fil = open(final[i])
        if(fil):
            print final[i] + " read!"
            data = mmap.mmap(fil.fileno(), size, access=mmap.ACCESS_READ)
            for j in range (len(wordlist)):
                count[j].append(re.findall(regex[j], data))
        fil.close()
    for k in range(len(wordlist)):
        frequency.append(sum(count[k]))
    print frequency

count是一个列表列表,每个列表都存储了一些数字。我希望将每个列表的总和作为一个元素存储到一个新列表中frequency

当我运行代码时出现错误:

Traceback (most recent call last):
File "C:\Users\Animesh\Desktop\_zipf.py", line 52, in <module>
get_word_count(wordlist, final)
File "C:\Users\Animesh\Desktop\_zipf.py", line 32, in get_word_count
frequency.append(sum(count[k]))
TypeError: unsupported operand type(s) for +: 'int' and 'list'

我应该在我的代码中更改什么?请帮忙

4

2 回答 2

2
count[j].append(re.findall(regex[j], data))

您正在将正则表达式找到的单词列表添加到数组count[j]中,因此每个count元素都是字符串列表的列表,因此调用时会出错sum(count[k])

我认为您想附加到count[k]找到的单词数:

count[j].append(len(re.findall(regex[j], data)))
于 2013-01-31T20:14:11.187 回答
1

如果你想让它更简单,你可以去掉count = [[] for x in xrange(len(wordlist))]and count = [],然后在 for 循环中让它增加一个临时变量并将其附加到 for 循环之后的计数。

size = 0
for j in range (len(wordlist)):
    size += len(re.findall(regex[j], data)) #thanks to CharlesB for this bit
count.append(size) #you could also cut out the middle man and just append frequency 
于 2013-01-31T20:31:33.407 回答