在寻找 Dictionary 的快速复合键时,我遇到了我无法理解也无法证明的异常情况。
在有限的测试中
Dictionary<KeyValuePair<UInt32, UInt32>, string>
明显慢于 (200:1)
Dictionary<KeyValuePair<UInt16, UInt16>, string>
测试从 0 到 1000 Populate 和 ContainsKey 的两个循环
Poplulate ContainsKey
UInt32 92085 86578
UInt16 2201 431
问题是
new KeyValuePair<UInt32, UInt32>(i, j).GetHashCode();
产生许多重复。
在循环 i 和 j 1024 中,仅创建了 1024 个唯一哈希值。
根据 CasperOne 的雪崩评论,尝试了 i*31 和 j*97(两个素数),这导致 105280 在 1024X1024 上是唯一的。还是有很多重复的。CasperOne 我知道这和随机的不一样。但随机化输入不是我的工作。GetHashCode() 应该随机化输出。
为什么重复次数多?
相同的循环
new KeyValuePair<UInt16, UInt16>(i, j).GetHashCode();
产生 1024 X 1024 唯一哈希码(完美)。
Int32 也有同样的问题。
这些重复的哈希值杀死
Dictionary<KeyValuePair<UInt32, UInt32>, string>
与 Int16 相比,Tuple 还会生成许多在 Int32 上不会降级的重复项。
生成原始 KVP 和原始 KPV.GetHashCode 的时间类似。
与 HashSet 相同的异常。
Dictionary<KeyValuePair<UInt32, UInt32>, string> dKVPu32 = new Dictionary<KeyValuePair<UInt32, UInt32>, string>();
Dictionary<KeyValuePair<UInt16, UInt16>, string> dKVPu16 = new Dictionary<KeyValuePair<UInt16, UInt16>, string>();
KeyValuePair<UInt32, UInt32> kvpUint32;
KeyValuePair<UInt16, UInt16> kvpUint16;
int range = 1000;
Int32 hashCode;
HashSet<Int32> kvpUint32Hash = new HashSet<Int32>();
HashSet<Int32> kvpUint16Hash = new HashSet<Int32>();
Stopwatch sw = new Stopwatch();
sw.Start();
for (UInt32 i = 0; i < range; i++)
{
for (UInt32 j = 0; j < range; j++)
{
kvpUint32 = new KeyValuePair<UInt32, UInt32>(i, j);
}
}
Console.WriteLine("UInt32 raw " + sw.ElapsedMilliseconds.ToString());
// 7
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
for (UInt16 j = 0; j < range; j++)
{
kvpUint16 = new KeyValuePair<UInt16, UInt16>(i, j);
}
}
Console.WriteLine("UInt16 raw " + sw.ElapsedMilliseconds.ToString());
// 6
sw.Restart();
for (UInt32 i = 0; i < range; i++)
{
for (UInt32 j = 0; j < range; j++)
{
hashCode = new KeyValuePair<UInt32, UInt32>(i, j).GetHashCode();
kvpUint32Hash.Add(hashCode);
}
}
Console.WriteLine("UInt32 GetHashCode " + sw.ElapsedMilliseconds.ToString() + " unique count " + kvpUint32Hash.Count.ToString());
// 285 1024
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
for (UInt16 j = 0; j < range; j++)
{
hashCode = new KeyValuePair<UInt16, UInt16>(i, j).GetHashCode();
kvpUint16Hash.Add(hashCode);
}
}
Console.WriteLine("UInt16 GetHashCode " + sw.ElapsedMilliseconds.ToString() + " unique count " + kvpUint16Hash.Count.ToString());
// 398 1000000
sw.Restart();
Console.ReadLine();
for (UInt32 i = 0; i < range; i++)
{
for (UInt32 j = 0; j < range; j++)
{
dKVPu32.Add(new KeyValuePair<UInt32, UInt32>(i, j), String.Format("{0} {1}", i.ToString(), j.ToString()));
}
}
Console.WriteLine("hsKVPu32 pop " + sw.ElapsedMilliseconds.ToString());
// 92085
sw.Restart();
for (UInt32 i = 0; i < range; i++)
{
for (UInt32 j = 0; j < range; j++)
{
if (!dKVPu32.ContainsKey(new KeyValuePair<UInt32, UInt32>(i, j))) Debug.WriteLine("Opps"); ;
}
}
Console.WriteLine("hsKVPu32 find " + sw.ElapsedMilliseconds.ToString());
// 86578
dKVPu32.Clear();
dKVPu32 = null;
GC.Collect();
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
for (UInt16 j = 0; j < range; j++)
{
dKVPu16.Add(new KeyValuePair<UInt16, UInt16>(i, j), String.Format("{0} {1}", i.ToString(), j.ToString()));
}
}
Console.WriteLine("hsKVPu16 pop " + sw.ElapsedMilliseconds.ToString());
// 2201
sw.Restart();
for (UInt16 i = 0; i < range; i++)
{
for (UInt16 j = 0; j < range; j++)
{
if (!dKVPu16.ContainsKey(new KeyValuePair<UInt16, UInt16>(i, j))) Debug.WriteLine("Opps"); ;
}
}
sw.Stop();
Console.WriteLine("hsKVPu16 find " + sw.ElapsedMilliseconds.ToString());
// 431
PS 最快的是打包.EG ((UInt32)int1 << 16) | 整数2;
第一个 UInt32 列的哈希等于接下来两个 KVP 的哈希。
2281371105 8 992
2281371104 8 993
2281371107 8 994
2281371145 0 0
2281371147 0 2
2281371149 0 4
2281371151 0 6
2281371137 0 8
2281371144 0 1
2281371146 0 3
2281371148 0 5
2281371150 0 7
2281371136 0 9
2281371144 1 0
2281371145 1 1
2281371146 1 2 2281371147
1
3 2281371148
1
4 2281371149
1
5
2281371150
2281371147 2 0
2281371146 2 1
2281371144 2 3
2281371151 2 4
2281371150 2 5
2281371149 2 6
2281371148 2 7
2281371139 2 8
我发现的唯一模式是和或差或 KVP 匹配。
但是找不到何时求和何时减去的模式。
这是一个糟糕的哈希,所以知道它是什么没有什么价值。