python - 使用 mincemeat.py 以列表形式“生成”字典

Question

我正在尝试理解 map-reduce 概念，并研究使用 mincemeat.py（python 的开源库）实现小程序。

我已经使用 mapper 和 reducer 获得了一袋单词的简单字数。但是，我想实现跨文档查找所有单词的 tf-idf 分数。要做到这一点，我认为的第一步是获取类型的字典{[word,docID]->count}。为此，我编写了以下代码

def mapfn(k, v):
    for line in v.splitlines():
        for word in line.split():
            l = [word.lower(), k]
            yield l, 1

但是，当我运行程序时，我收到以下错误。

error: uncaptured python exception, closing channel <__main__.Client connected at 0x8a434ac> 
(<type 'exceptions.TypeError'>:unhashable type: 'list'
 [/usr/lib/python2.7/asyncore.py|read|83]
 [/usr/lib/python2.7/asyncore.py|handle_read_event|444] 
 [/usr/lib/python2.7/asynchat.py|handle_read|140] 
 [mincemeat.py|found_terminator|96] 
 [mincemeat.py|process_command|194] 
 [mincemeat.py|call_mapfn|171])

我的理解是，当使用 mincemeat.py 时，我们无法在 map 中生成列表，因为错误表明在减少时不需要列表。我对么？如果我是正确的，有没有办法做到这一点？或者，我是否需要查看除 mincemeat 之外的任何其他库？

score 3 · Accepted Answer

我不知道 mincemeat，但很明显它正在尝试将列表用作字典或集合的键，这是不可能的。不要产生一个列表，而是尝试产生一个元组。（换句话说，更改[word.lower(),k]为(word.lower(), k).

python - 使用 mincemeat.py 以列表形式“生成”字典

1 回答 1

Related

Reference