我正在尝试编写一个简单的并行归并排序算法来帮助我理解超立方体通信的概念。我有一个 C 程序,它生成一个随机整数数组并分配线程来对数据进行排序。每个线程对其数据的子列表进行连续排序,然后与其他线程执行比较交换操作以合并子列表。下面是线程函数。
void *th_fun(void* args)
{
int i, j;
int my_id = *((int*)args);
int chunk_size = N_DATA/N_THREADS;
int start_index = my_id*chunk_size;
//set up the memory this thread will use
th_mem[my_id].my_data = &data[start_index];
th_mem[my_id].partner_data = malloc(sizeof(int)*chunk_size);
for(i = 0; i < chunk_size; i++)
th_mem[my_id].partner_data [i] = 0;
//sort my data serially, should use a mergesort
qsort((void*)th_mem[my_id].my_data, chunk_size, sizeof(int), compare);
//wait for all threads to finish sorting
barrier_simple();
//compare-exchange accross hypercube
int partner_id, op;
int logth = log(N_THREADS)/log(2);
for(i = 0; i < logth; i++)
{
partner_id = my_id ^ (int)pow(2,logth-i-1);
//send my data to partner
for(j = 0; j < chunk_size; j++)
{
th_mem[partner_id].partner_data[j] = th_mem[my_id].my_data[j];
}
//wait for all data to be sent
barrier_simple();
//get the lowest or highest elements between my_data, partner_data
int x = my_id & (int)pow(2,logth-i-1);
op = (x > 0 ? 1:-1);
int* output = merge(th_mem[my_id].my_data, th_mem[my_id].partner_data,
chunk_size, op);
//save results
for(j = 0; j < chunk_size; j++)
{
th_mem[my_id].my_data[j] = output[j];
}
free(output);
//wait for all data to be merged
barrier_simple();
}
free(th_mem[my_id].partner_data);
pthread_exit(args);
}
使用 4 个线程排序的 12 个数据元素的示例运行(输出被分块以指示哪个线程负责该数据):
./parallel_merge 12 4
Unsorted:
0:924 674 704
1:495 554 645
2:979 285 119
3:878 80 620
Serial sort:
0:80 119 285
1:495 554 620
2:645 674 704
3:878 924 979
Parallel sort:
0:80 119 285
1:495 554 674
2:620 645 704
3:878 924 979
如您所见,并行排序并不完全正确(除了性能考虑之外,ofc)。我已经运行了许多 gdb,但无法确定为什么超立方体交换不起作用。任何帮助,将不胜感激。我的参考资料是超立方体模板和合并排序