目标
我有一个可变的 Map[Long, Long] 有数百万个条目。我需要使用数百万次更新进行多次更新迭代。我想尽快做到这一点。
背景
目前,最快的方法是使用单线程 mutable.LongMap[Long]。此类型针对 Long 类型作为键进行了优化。
其他地图类型似乎更慢 - 但我可能没有正确实现它们,因为我试图同时和/或并行进行更新但没有成功。在 Scala 中,并行更新地图可能实际上并没有发生或不可能。
从最快到最慢的顺序:
- LongMap[Long](从上方)
- TrieMap[长,长]
- ParTrieMap[长,长]
- HashMap[长,长]
- ParHashMap[长,长]
- ParMap[长,长]
如果更快的方法不可变,那也没关系,但我认为情况不会如此。可变映射可能最适合此用例。
生成测试数据和计时测试的代码
import java.util.Calendar
import scala.collection.mutable
object DictSpeedTest2 {
//helper constants
val million: Long = 1000000
val billion: Long = million * 1000
//config
val garbageCollectionWait = 3
val numEntries: Long = million * 10 //may need to increase JVM memory with something like: -Xmx32g
val maxValue: Long = billion * million // max Long = 9223372036854775807L
// this is 1000000000000000L
def main(args: Array[String]): Unit = {
//generate random data; initial entries in a; updates in b
val a = genData(numEntries, maxValue, seed = 1000)
val b = genData(numEntries, maxValue, seed = 9999)
//initialization
val dict = new mutable.LongMap[Long]()
a.foreach(x => dict += (x._1 -> x._2))
//run and time test
println("start test: " + Calendar.getInstance().getTime)
val start = System.currentTimeMillis
b.foreach(x => dict += (x._1 -> x._2)) //updates
val end = System.currentTimeMillis
//print runtime
val durationInSeconds = (end - start).toFloat / 1000 + "s"
println("end test: " + Calendar.getInstance().getTime + " -- " + durationInSeconds)
}
def genData(n: Long, max: Long, seed: Long): Array[(Long, Long)] = {
val r = scala.util.Random
r.setSeed(seed) //deterministic generation of arrays
val a = new Array[(Long, Long)](n.toInt)
a.map(_ => (r.nextInt(), r.nextInt()) )
}
}
当前时间
带有上述代码的 LongMap[Long] 在我的 2018 MacBook Pro 上按以下时间完成:
- ~3.5 秒,numEntries = 1000 万
- ~100 秒,numEntries = 1 亿