mapreduce - 使用 MapReduce 为 String 分配唯一 ID

翻译自：https://stackoverflow.com/questions/13189578 2012-11-02T05:03:13.160

244 次

2

我想运行一个 MapReduce 作业，我想从给定文件中扫描多个列，并为每列的每个不同值分配一个唯一 ID（索引号）。主要挑战是在不同节点或 Reducer 的不同实例上遇到的相同值共享相同的 ID。

目前，我正在使用 zookeeper 来共享唯一 ID，但这会对性能产生影响。我什至将信息保存在减速器级别的本地缓存中，以避免多次访问 Zookeeper 以获得相同的值。我想探索是否有其他更好的机制来做同样的事情。

1 回答 1

1

I can suggest two possible solutions for your problem

Create unique ID based on your value. This might be a hash function with low collision rate.
Use faster storage than ZooKeeper. You can try simple key value storage like Redis to store value to id mapping.

于 2012-11-03T07:50:12.613 回答