c++ - MPI C++ 矩阵加法、函数参数和函数返回

Question

在过去的 2 年里，我一直在从互联网上学习 C++，最后我需要深入研究 MPI。我一直在搜索 stackoverflow 和互联网的其他部分（包括http://people.sc.fsu.edu/~jburkardt/cpp_src/mpi/mpi.html和https://computing.llnl.gov/tutorials/mpi /#LLNL）。我想我已经掌握了一些逻辑，但我很难理解以下内容：

#include (stuff)
using namespace std;

vector<double> function(vector<double> &foo, const vector<double> &bar, int dim, int rows);

int main(int argc, char** argv)
{
    vector<double> result;//represents a regular 1D vector
    int id_proc, tot_proc, root_proc = 0;
    int dim;//set to number of "columns" in A and B below
    int rows;//set to number of "rows" of A and B below
    vector<double> A(dim*rows), B(dim*rows);//represent matrices as 1D vectors

    MPI::Init(argc,argv);
    id_proc = MPI::COMM_WORLD.Get_rank();
    tot_proc = MPI::COMM_WORLD.Get_size();

    /*
    initialize A and B here on root_proc with RNG and Bcast to everyone else
    */

    //allow all processors to call function() so they can each work on a portion of A
    result = function(A,B,dim,rows);

    //all processors do stuff with A
    //root_proc does stuff with result (doesn't matter if other processors have updated result)

    MPI::Finalize();
    return 0;
}

vector<double> function(vector<double> &foo, const vector<double> &bar, int dim, int rows)
{
    /*
    purpose of function() is two-fold:
    1. update foo because all processors need the updated "matrix"
    2. get the average of the "rows" of foo and return that to main (only root processor needs this)
    */

    vector<double> output(dim,0);

    //add matrices the way I would normally do it in serial
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < dim; j++)
        {
            foo[i*dim + j] += bar[i*dim + j];//perform "matrix" addition (+= ON PURPOSE)
        }
    }

    //obtain average of rows in foo in serial
    for (int i = 0; i < rows; i++)
    {
        for (int j = 0; j < dim; j++)
        {
            output[j] += foo[i*dim + j];//sum rows of A
        }
    }

    for (int j = 0; j < dim; j++)
    {
            output[j] /= rows;//divide to obtain average
    }

    return output;        
}

上面的代码只是为了说明这个概念。我主要关心的是并行化矩阵加法，但令我惊讶的是：

1）如果每个处理器仅在该循环的一部分上工作（自然我必须修改每个处理器的循环参数）我使用什么命令将 A 的所有部分合并回一个所有处理器都具有的单个更新的 A他们的记忆。我的猜测是我必须做某种 Alltoall，其中每个处理器将其 A 的部分发送到所有其他处理器，但是我如何保证（例如）处理器 3 处理的第 3 行覆盖其他处理器的第 3 行，而不是偶然的第1行。

2）如果我在函数（）中使用Alltoall，是否必须允许所有处理器进入函数（），或者我可以使用...隔离函数（）

if (id_proc == root_proc)
{
    result = function(A,B,dim,rows);
}

…然后在 function() 内部处理所有并行化。听起来很傻，我试图在一个处理器上做很多工作（带有广播），并且只是并行化耗时的大循环。只是试图在概念上保持代码简单，这样我就可以得到我的结果并继续前进。

3）对于平均部分，如果我想并行化它，我确定我可以使用减少命令，对吗？

另外，顺便说一句：有没有办法调用 Bcast() 使其阻塞？我想用它来同步我所有的处理器（增强库不是一个选项）。如果没有，那么我将使用 Barrier()。感谢您对这个问题的回答，感谢 stackoverflow 社区在过去两年中学习我如何编程！:)

score 2 · Accepted Answer

1）您正在寻找的功能是 MPI_Allgather。MPI_Allgather 将让您从每个处理器发送一行并在所有处理器上接收结果。

2) 是的，您可以在函数中使用某些处理器。由于 MPI 函数与通信器一起工作，因此您必须为此目的创建一个单独的通信器。我不知道这是如何在 C++ 绑定中实现的，但 C 绑定使用 MPI_Comm_create 函数。

3) 是的，请参阅 MPI_Allreduce。

旁白：Bcast 阻塞一个进程，直到分配给该进程的发送/接收操作完成。如果您想等待所有处理器完成他们的工作（我不知道您为什么要这样做），您应该使用 Barrier()。

额外说明：我不建议使用 C++ 绑定，因为它们已被贬值，而且您不会找到有关如何使用它们的具体示例。如果您需要 C++ 绑定，Boost MPI 是可以使用的库，但它并不涵盖所有 MPI 函数。

c++ - MPI C++ 矩阵加法、函数参数和函数返回

1 回答 1

Related

Reference