scala - Uber 的 Spark LSH 中的 numHashTable 使用什么值？

翻译自：https://stackoverflow.com/questions/47419753 2017-11-21T18:02:39.600

1297 次

我正在尝试使用.approxSimilarityJoinSpark MLlib LSH: MinHash for Jaccard Distance例如

val mh = new MinHashLSH()
    .setNumHashTables(5)
    .setInputCol("features")
    .setOutputCol("hashes")

我知道 numHashTables 越高，系统越准确，计算越复杂/越慢。我有两个关于参数的问题：

注意：我相信该算法已被 Uber 添加到 MLlib：https ://eng.uber.com/lsh/

0 回答 0