您正在按顺序执行 10000000(1000 万)x 1000 次迭代,并为并行版本中的每个线程执行 5000000(500 万)x 1000 次迭代。以我的经验,这足以使启动开销变得微不足道。结果对我来说似乎是正确的。
对于 2 个内核和 2 个线程,不涉及时间片(至少在 2 个工作线程中),因为调度程序足够聪明,可以将线程放在单独的内核上并将它们保留在那里。
为了看到一些降级,您需要在缓存中移动一些内存,这样每个上下文切换实际上会通过导致一些数据从缓存中被逐出来惩罚性能。这是我得到的运行时间:
./a.out 2 500000000
线程数 = 2
每个线程的迭代次数 = 250000000
总耗时 = 5.931148
./a.out 1000 500000000
线程数 = 1000
每个线程的迭代次数 = 500000
总耗时 = 6.563666
./a.out 2000 500000000
线程数 = 2000
每个线程中的迭代次数 = 250000
总时间 = 7.087449
这是代码。我基本上是在给定线程之间划分一个大数组并对数组中的每个项目进行平方:
#include <stdio.h>
#include <stdlib.h>
#include <pthread.h>
long* array;
int length;
int threads;
void *tfunc(void *arg) {
int n = (int)arg;
int i;
int j;
int x;
long sum = 0;
//printf("%d\n",*n);
int start = n * (length / threads);
int end = (n + 1) * (length / threads);
for (i=start; i<end; i++) {
array[i] = array[i] * array[i];
//printf("%d\n",i);
}
return(0);
}
double timestamp() {
struct timeval tp;
gettimeofday(&tp, NULL);
return (double)tp.tv_sec + tp.tv_usec / 1000000.;
}
int main(int argc, char *argv[]) {
int numberOfIterations = atoi(argv[2]);
int numberOfThreads = atoi(argv[1]);
int i;
printf("Number of threads = %d\n",numberOfThreads);
printf("Number of iterations in each thread = %d \n", numberOfIterations / numberOfThreads);
pthread_t workerThreads[numberOfThreads];
int *arg = &numberOfIterations;
array = (long*)malloc(numberOfIterations * sizeof(long));
length = numberOfIterations;
threads = numberOfThreads;
int result[numberOfThreads];
double timeTaken;
timeTaken = timestamp();
for(i=0; i<numberOfThreads; i++) {
result[i] = pthread_create(workerThreads+i, NULL, tfunc, (void*)i);
}
for(i=0; i<numberOfThreads; i++) {
pthread_join(workerThreads[i], NULL);
}
timeTaken = timestamp() - timeTaken;
printf("Total time taken = %f\n", timeTaken);
/*printf("The results are\n");
for(i=0; i<numberOfThreads; i++) {
printf("%d\n",result[i]);
}*/
free(array);
exit(0);
}