我正在尝试优化一些代码,使用标准来尝试比较,例如,将INLINE
pragma 添加到函数的效果。但我发现重新编译/运行之间的结果不一致。
我需要知道如何使结果在运行中保持一致以便我可以比较它们,或者如何评估基准是否可靠,即(我猜)如何解释有关方差的细节,“成本时钟呼叫”等。
我的具体案例的详细信息
这与我上面的主要问题正交,但在我的特定情况下,有几件事可能会导致不一致:
我正在尝试对
IO
操作进行基准测试,whnfIO
因为本示例whnf
中使用的方法不起作用。我的代码使用并发
我有很多标签和废话打开
示例输出
这两个都来自相同的代码,以完全相同的方式编译。我直接在下面进行了第一次运行,进行了更改并进行了另一个基准测试,然后恢复并再次运行了第一个代码,编译:
ghc --make -fforce-recomp -threaded -O2 Benchmark.hs
首轮:
estimating clock resolution...
mean is 16.97297 us (40001 iterations)
found 6222 outliers among 39999 samples (15.6%)
6055 (15.1%) high severe
estimating cost of a clock call...
mean is 1.838749 us (49 iterations)
found 8 outliers among 49 samples (16.3%)
3 (6.1%) high mild
5 (10.2%) high severe
benchmarking actors/insert 1000, query 1000
collecting 100 samples, 1 iterations each, in estimated 12.66122 s
mean: 110.8566 ms, lb 108.4353 ms, ub 113.6627 ms, ci 0.950
std dev: 13.41726 ms, lb 11.58487 ms, ub 16.25262 ms, ci 0.950
found 2 outliers among 100 samples (2.0%)
2 (2.0%) high mild
variance introduced by outliers: 85.211%
variance is severely inflated by outliers
benchmarking actors/insert 1000, query 100000
collecting 100 samples, 1 iterations each, in estimated 945.5325 s
mean: 9.319406 s, lb 9.152310 s, ub 9.412688 s, ci 0.950
std dev: 624.8493 ms, lb 385.4364 ms, ub 956.7049 ms, ci 0.950
found 6 outliers among 100 samples (6.0%)
3 (3.0%) low severe
1 (1.0%) high severe
variance introduced by outliers: 62.576%
variance is severely inflated by outliers
第二次运行,慢约 3 倍:
estimating clock resolution...
mean is 51.46815 us (10001 iterations)
found 203 outliers among 9999 samples (2.0%)
117 (1.2%) high severe
estimating cost of a clock call...
mean is 4.615408 us (18 iterations)
found 4 outliers among 18 samples (22.2%)
4 (22.2%) high severe
benchmarking actors/insert 1000, query 1000
collecting 100 samples, 1 iterations each, in estimated 38.39478 s
mean: 302.4651 ms, lb 295.9046 ms, ub 309.5958 ms, ci 0.950
std dev: 35.12913 ms, lb 31.35431 ms, ub 42.20590 ms, ci 0.950
found 1 outliers among 100 samples (1.0%)
variance introduced by outliers: 84.163%
variance is severely inflated by outliers
benchmarking actors/insert 1000, query 100000
collecting 100 samples, 1 iterations each, in estimated 2644.987 s
mean: 27.71277 s, lb 26.95914 s, ub 28.97871 s, ci 0.950
std dev: 4.893489 s, lb 3.373838 s, ub 7.302145 s, ci 0.950
found 21 outliers among 100 samples (21.0%)
4 (4.0%) low severe
3 (3.0%) low mild
3 (3.0%) high mild
11 (11.0%) high severe
variance introduced by outliers: 92.567%
variance is severely inflated by outliers
我注意到,如果我按“时钟呼叫的估计成本”进行缩放,这两个基准相当接近。这是我应该做的以获得一个实数进行比较吗?