环境:x86-64;linux-centos;8-cpu-core
为了测试“错误共享性能”,我编写了这样的 c++ 代码:
volatile int32_t a;
volatile int32_t b;
int64_t p1[7];
volatile int64_t c;
int64_t p2[7];
volatile int64_t d;
void thread1(int param) {
auto start = chrono::high_resolution_clock::now();
for (size_t i = 0; i < 1000000000; ++i) {
a = i % 512;
}
auto end = chrono::high_resolution_clock::now();
cout << " 1 cost:" << chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << endl;
}
void thread2(int param) {
auto start = chrono::high_resolution_clock::now();
for (size_t i = 0; i < 1000000000; ++i) {
b = i % 512;
}
auto end = chrono::high_resolution_clock::now();
cout << " 2 cost:" << chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << endl;
}
void thread3(int param) {
auto start = chrono::high_resolution_clock::now();
for (size_t i = 0; i < 1000000000; ++i) {
c = i % 512;
}
auto end = chrono::high_resolution_clock::now();
cout << " 3 cost:" << chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << endl;
}
void thread4(int param) {
auto start = chrono::high_resolution_clock::now();
for (size_t i = 0; i < 1000000000; ++i) {
d = i % 512;
}
auto end = chrono::high_resolution_clock::now();
cout << " 4 cost:" << chrono::duration_cast<std::chrono::nanoseconds>(end - start).count() << endl;
}
这是我的编译命令:g++ xxx.cpp --std=c++11 -O0 -lpthread -g
所以没有 opt(O0)
我打印 a、b、c、d 虚拟地址是
a addr 0x406200
b addr 0x406204
c addr 0x406258
d addr 0x406298
这是执行结果:
4 cost:2186474910
3 cost:6114449628
1 cost:7464439728
2 cost:7469428696
据我了解,thread3 与其他线程没有“缓存弹跳”或“错误共享”问题,那么为什么它比线程 4 慢?
另外:如果我更改int32_t a,b
为int64_t a,b
,结果将更改为:
a addr 0x4061e0
b addr 0x4061e8
c addr 0x406238
d addr 0x406278
3 cost:2188341526
4 cost:2193782423
2 cost:6479324727
1 cost:6645607256
这是我预测的