我正在使用一个简单的程序进行一些测试,该程序使用 atomic_add_64 与互斥锁方法测量 64 位值上的简单原子增量的性能。令我困惑的是 atomic_add 比互斥锁慢 2 倍。
编辑!!!我又做了一些测试。看起来原子比互斥锁更快,并且可以扩展到 8 个并发线程。之后,原子的性能显着下降。
我测试过的平台是:
SunOS 5.10 Generic_141444-09 sun4u sparc SUNW,Sun-Fire-V490
CC:Sun C++ 5.9 SunOS_sparc 补丁 124863-03 2008/03/12
该程序非常简单:
#include <stdio.h>
#include <stdint.h>
#include <pthread.h>
#include <atomic.h>
uint64_t g_Loops = 1000000;
volatile uint64_t g_Counter = 0;
volatile uint32_t g_Threads = 20;
pthread_mutex_t g_Mutex;
pthread_mutex_t g_CondMutex;
pthread_cond_t g_Condition;
void LockMutex()
{
pthread_mutex_lock(&g_Mutex);
}
void UnlockMutex()
{
pthread_mutex_unlock(&g_Mutex);
}
void InitCond()
{
pthread_mutex_init(&g_CondMutex, 0);
pthread_cond_init(&g_Condition, 0);
}
void SignalThreadEnded()
{
pthread_mutex_lock(&g_CondMutex);
--g_Threads;
pthread_cond_signal(&g_Condition);
pthread_mutex_unlock(&g_CondMutex);
}
void* ThreadFuncMutex(void* arg)
{
uint64_t counter = g_Loops;
while(counter--)
{
LockMutex();
++g_Counter;
UnlockMutex();
}
SignalThreadEnded();
return 0;
}
void* ThreadFuncAtomic(void* arg)
{
uint64_t counter = g_Loops;
while(counter--)
{
atomic_add_64(&g_Counter, 1);
}
SignalThreadEnded();
return 0;
}
int main(int argc, char** argv)
{
pthread_mutex_init(&g_Mutex, 0);
InitCond();
bool bMutexRun = true;
if(argc > 1)
{
bMutexRun = false;
printf("Atomic run!\n");
}
else
printf("Mutex run!\n");
// start threads
uint32_t threads = g_Threads;
while(threads--)
{
pthread_t thr;
if(bMutexRun)
pthread_create(&thr, 0,ThreadFuncMutex, 0);
else
pthread_create(&thr, 0,ThreadFuncAtomic, 0);
}
pthread_mutex_lock(&g_CondMutex);
while(g_Threads)
{
pthread_cond_wait(&g_Condition, &g_CondMutex);
printf("Threads to go %d\n", g_Threads);
}
printf("DONE! g_Counter=%ld\n", (long)g_Counter);
}
在我们的盒子上运行的测试结果是:
$ CC -o atomictest atomictest.C
$ time ./atomictest
Mutex run!
Threads to go 19
...
Threads to go 0
DONE! g_Counter=20000000
real 0m15.684s
user 0m52.748s
sys 0m0.396s
$ time ./atomictest 1
Atomic run!
Threads to go 19
...
Threads to go 0
DONE! g_Counter=20000000
real 0m24.442s
user 3m14.496s
sys 0m0.068s
您在 Solaris 上遇到过这种类型的性能差异吗?任何想法为什么会发生这种情况?
在 Linux 上,相同的代码(使用 gcc __sync_fetch_and_add)与互斥锁版本相比,性能提高了 5 倍。
谢谢,奥克塔夫