我创建了一个简单的程序来测量线程性能。为了说明我的观点,我删除了一个较大程序的部分内容。希望读起来不会太可怕。
这是程序:
#include <sstream>
#include <thread>
#include <list>
#include <map>
#include <mutex>
#include <condition_variable>
#include <iostream>
#include <string.h>
std::mutex m_totalTranMutex;
int m_totalTrans = 0;
bool m_startThreads = false;
std::condition_variable m_allowThreadStart;
std::mutex m_threadStartMutex;
std::map<int,std::thread::native_handle_type> m_threadNativeHandles;
char *my_strdup(const char *str)
{
size_t len = strlen(str);
char *x = (char *)malloc(len+1);
if(x == nullptr)
return nullptr;
memcpy(x,str,len+1);
return x;
}
void DoWork()
{
char abc[50000];
char *s1, *s2;
std::strcpy(abc, "12345");
std::strcpy(abc+20000, "12345");
s1 = my_strdup(abc);
s2 = my_strdup(abc);
free(s1);
free(s2);
}
void WorkerThread(int threadID)
{
{
std::unique_lock<std::mutex> lk(m_threadStartMutex);
m_allowThreadStart.wait(lk, []{return m_startThreads;});
}
double transPerSec = 1 / 99999;
int transactionCounter = 0;
int64_t clockTicksUsed = 0;
std::thread::native_handle_type handle = m_threadNativeHandles[threadID];
std::chrono::high_resolution_clock::time_point current = std::chrono::high_resolution_clock::now();
std::chrono::high_resolution_clock::time_point start = std::chrono::high_resolution_clock::now();
std::chrono::high_resolution_clock::time_point end = start + std::chrono::minutes(1);
int random_num_loops = 0;
double interarrivaltime = 0.0;
double timeHolderReal = 0.0;
while(current < end)
{
std::chrono::high_resolution_clock::time_point startWork = std::chrono::high_resolution_clock::now();
for(int loopIndex = 0; loopIndex < 100; ++loopIndex)
{
for(int alwaysOneHundred = 0; alwaysOneHundred < 100; ++alwaysOneHundred)
{
DoWork();
}
}
std::chrono::high_resolution_clock::time_point endWork = std::chrono::high_resolution_clock::now();
++transactionCounter;
clockTicksUsed += std::chrono::duration_cast<std::chrono::milliseconds>(endWork - startWork).count();
current = std::chrono::high_resolution_clock::now();
}
std::lock_guard<std::mutex> tranMutex(m_totalTranMutex);
std::cout << "Thread " << threadID << " finished with " << transactionCounter << " transaction." << std::endl;
m_totalTrans += transactionCounter;
}
int main(int argc, char *argv[])
{
std::stringstream ss;
int numthreads = atoi(argv[1]);
std::list<std::thread> threads;
int threadIds = 1;
for(int i = 0; i < numthreads; ++i)
{
threads.push_back(std::thread(&WorkerThread, threadIds));
m_threadNativeHandles.insert(std::make_pair(threadIds, threads.rbegin()->native_handle()));
++threadIds;
}
{
std::lock_guard<std::mutex> lk(m_threadStartMutex);
m_startThreads = true;
}
m_allowThreadStart.notify_all();
//Join until completion
for(std::thread &th : threads)
{
th.join();
}
ss << "TotalTran" << std::endl
<< m_totalTrans << std::endl;
std::cout << ss.str();
}
应用程序用法:app N 其中 app 是应用程序的名称,N 是要生成的线程数。程序运行 1 分钟。
在 Windows 上,我使用 Visual Studio 2012 构建该程序。我在四核 I7(4 核,每核 2 个线程)上执行该程序。
我得到以下信息:
simplethread 1
Thread 1 finished with 1667 transaction.
TotalTran
1667
simplethread 2
Thread 1 finished with 1037 transaction.
Thread 2 finished with 1030 transaction.
TotalTran
2067
simplethread 3
Thread 3 finished with 824 transaction.
Thread 2 finished with 830 transaction.
Thread 1 finished with 837 transaction.
TotalTran
2491
simplethread 4
Thread 3 finished with 688 transaction.
Thread 2 finished with 693 transaction.
Thread 1 finished with 704 transaction.
Thread 4 finished with 691 transaction.
TotalTran
2776
simplethread 8
Thread 2 finished with 334 transaction.
Thread 6 finished with 325 transaction.
Thread 7 finished with 346 transaction.
Thread 1 finished with 329 transaction.
Thread 8 finished with 329 transaction.
Thread 3 finished with 338 transaction.
Thread 5 finished with 331 transaction.
Thread 4 finished with 330 transaction.
TotalTran
2662
E:\Development\Projects\Applications\CPUBenchmark\Debug>simplethread 16
Thread 16 finished with 163 transaction.
Thread 15 finished with 169 transaction.
Thread 12 finished with 165 transaction.
Thread 9 finished with 170 transaction.
Thread 10 finished with 166 transaction.
Thread 4 finished with 164 transaction.
Thread 13 finished with 166 transaction.
Thread 8 finished with 165 transaction.
Thread 6 finished with 165 transaction.
Thread 5 finished with 168 transaction.
Thread 2 finished with 161 transaction.
Thread 1 finished with 159 transaction.
Thread 7 finished with 160 transaction.
Thread 11 finished with 161 transaction.
Thread 14 finished with 163 transaction.
Thread 3 finished with 161 transaction.
TotalTran
2626
这些数字看起来有些可怜。我期待在这个系统上从一个做 X 工作的线程到 2 个做 2X 工作的线程更接近两倍。线程确实做了大约相同数量的工作,但在一分钟内没有那么多。
当我迁移到 solaris 时,情况变得更加奇怪。
在 Solaris 11 上,使用 GCC 4.8.0,我构建这个程序如下:
gcc -o 简单的 simpleThreads.cpp -I。-std=c++11 -DSOLARIS=1 -lstdc++ -lm
当我运行“./simple 1”时,我得到
Thread 1 finished with 19686 transaction.
TotalTran
19686
对于“./simple 2”,我得到:
Thread 1 finished with 5248 transaction.
Thread 2 finished with 2484 transaction.
TotalTran
7732
在 Solaris 上,2 线程的情况要慢得多。我无法弄清楚我做错了什么。我是 c++11 构造和线程的新手。所以这是一个双重打击。gcc -v 显示线程模型是posix。任何帮助,将不胜感激。