“mvapich2”的相关标签问题_Stack Overflow中文网

0 投票

1 回答

527 浏览

centos - 在具有指定 ID 的 CPU 上运行 mpirun

有谁知道如何在指定的 CPU 上执行 mpirun？虽然“mpirun -np 4”指定了使用的 CPU 数量，但我这里要做的是指定 CPU ID。

操作系统是 CentOS 5.6，MVAPICH2 用于具有 6x2 内核的单个节点上。

谢谢您的合作。

2012-01-17T18:54:01.740

0 投票

1 回答

777 浏览

mpi - 如何找出 InfiniBand 安装路径

我想自己编译 MVAPICH2，但不知道在哪里可以找到psm.h文件，在默认位置找不到。

任何人都知道我可以使用哪个命令来查找 InfiniBand？

mpi infiniband mvapich2

2012-10-29T20:29:59.527

0 投票

0 回答

46 浏览

mpi - MVAPICH2 R3 Rendez-vu 协议魔术

我想知道为什么 R3 协议在使用多个耗尽注册缓存的不同缓冲区时表现出出色的性能。它是否不需要固定和取消固定为发送提供的缓冲区，或者它如何隐藏这种开销？坚持 R3 协议总是一个好的选择吗？

在底部，您会看到一个图表，显示我的观察结果。我使用了 2 个节点并行发送和接收。x 轴表示用于发送的缓冲区数 n（每个 1MB）。主循环如下所示：

参见情节：http: //i47.tinypic.com/2vkn6ty.jpg

mpi mvapich2

2013-04-04T01:19:54.913

0 投票

1 回答

189 浏览

mpi - MPICH/MVAPICH 中的 MPIR 前缀

以下链接表示 MPICH/MVAPICH 中的函数名称前缀约定（例如，MPID 和 MPIU 前缀）

MPICH/MVAPICH 中的函数名称前缀约定

我只是想知道 MPIR 前缀代表什么（上面的链接中没有解释）？它在哪一层实现，哪一层可以访问它？提前致谢

mpi mpich mvapich2

2013-04-10T03:47:04.257

0 投票

3 回答

910 浏览

c++ - OpenMPI v/s Mvapich2：没有 MPI_Recv 的 MPI_Send

我正在尝试测试MPI_Sendwithout的效果MPI_Recv。我有以下程序，我使用 openmpi-1.4.5 和 mvapich2-1.9 编译和运行。我知道这些实现适用于 2 个不同版本的 MPI 标准，但我认为MPI_Send这些MPI_Recv标准是相同的：

使用 mvapich2，我总是得到以下输出（仅此而已）。基本上，程序似乎在 3 行之后就挂了：

使用 openmpi，我得到以下输出（无休止）：

问题：

为什么会有这样的差异？
如何使用 mvapich2 实现类似于 openmpi（无休止）的行为？

c++mpi openmpi mvapich2

2013-08-30T02:05:15.307

0 投票

1 回答

810 浏览

mpi - MVAPICH2 - 支持的网络类型

MVAPICH2 可以安装在 InfiniBand 或其他 HPC 网络技术以外的普通以太网网络上吗？

mpi ethernet infiniband mvapich2

2013-12-27T05:47:31.353

0 投票

1 回答

777 浏览

mvapich2 - Mvapich MPI_Init_thread（多线程支持）失败

我在超级计算集群（PSB 环境）中使用 mvapich。我需要启用 MPI_THREAD_MULTIPLE 支持才能运行我的程序。但是我的程序的输出表明 MPI_Init_thread 未能启用 MPI_THREAD_MULTIPLE。

PBS 脚本是：

（最后一行是 exe 命令。）

我的程序就像

输出就像

感谢您的任何提示。:)

mvapich2

2014-03-16T04:56:39.160

0 投票

0 回答

389 浏览

concurrency - MPI + CUDA AWARE, concurrents kernels and MPI_Sendrecv

During my work, I've found a little problem. Now I'm using MVAPICH-GDR-2.05 and Open MPI 1.7.4 with CUDA 6.0.

I'm working on the exchange of non contiguous elements between GPUs (like the columns of a matrix), and I'm trying to run two kernel's (one for scatter and one for gather) and a communication with MPI_Sendrecv between two GPUs concurrently.

I've used the CUDA profiler (nvprof) to see what my program is doing, and I've seen strange things:

With Open MPI 1.7.4, I've 3 cuda streams works concurrently.
With MVAPICH-gdr-2.05, I've two concurrent kernel's and the MPI_Sendrecv is not concurrent with them.

Do you know why MPI_Sendrecv in MVAPICH does this?

This is my pseudocode:

And these are the two profiler's screenshoot:

concurrency cuda mpi profiler mvapich2

2014-11-14T14:28:28.120

0 投票

3 回答

550 浏览

c++ - MVAPICH 挂起 MPI_Send 消息大于急切阈值

c++/mpi(mvapich)中有一个简单的程序，它发送一个float类型的数组。当我使用 MPI_Send,MPI_Ssend,MPI_Rsend 时，如果数据的大小超过急切的阈值（在我的程序中为 64k），那么在调用 MPI_Send 期间我的程序会挂起。如果数组小于阈值，程序运行正常。源代码如下：

我想我的设置可能有误，参数如下：

和配置：

该程序在 2 个进程上运行：

有任何想法吗？

c++mpi mvapich2

2014-12-10T16:16:11.760

0 投票

0 回答

807 浏览

mpi - 通过 mpirun_rsh 运行 MVAPICH2 的 cpi 示例失败

我是MVAPICH2的新用户，刚开始使用时遇到了麻烦。
首先，我想我已经安装成功了，通过这个：
./configure --disable-fortran --enable-cuda
make -j 4
make install
没有错误。

但是当我试图在example的目录中运行cpi的例子时，我遇到了这样的情况：

我可以通过 ssh 连接节点 gpu-cluster-1 和 gpu-cluster-4 而无需密码；
我使用 mpirun_rsh 在 gpu-cluster-1 和 gpu-cluster-4 上分别运行 cpi 示例，它工作正常，就像这样：
run@gpu-cluster-1:~/mvapich2-2.1rc1/examples$ mpirun_rsh -ssh -np 2 gpu-cluster-1 gpu-cluster-1 ./cpi
进程 0 of 2 在 gpu-cluster-1
进程 1 of 2 在 gpu-cluster-1
pi 大约为 3.1415926544231318，错误为 0.0000000008333387
挂钟时间 = 0.000089

run@gpu-cluster-4:~/mvapich2-2.1rc1/examples$ mpirun_rsh -ssh -np 2 gpu-cluster-4 gpu-cluster-4 ./cpi
进程 0 的 2 在 gpu-cluster-4
进程 1 2 在 gpu-cluster-4 上
pi 约为 3.1415926544231318，误差为 0.0000000008333387
挂钟时间 = 0.000134
我使用 mpiexec 在 gpu-cluster-1 和 gpu-cluster-4 上运行 cpi 示例，它工作正常，就像这样：
run@gpu-cluster-1:~/mvapich2-2.1rc1/examples$ mpiexec -np 2 -f hostfile ./cpi
Process 0 of 2 is on gpu-cluster-1
Process 1 of 2 is on gpu-cluster-4
pi 约为 3.1415926544231318，错误为 0.0000000008333387
挂钟时间 = 0.000352 hostfile
中的内容为“gpu-集群 1\ngpu 集群 4"
但是，当我在 gpu-cluster-1 和 gpu-cluster-4 上使用 mpirun_rsh、borh 运行 cpi 示例时，问题出现了：

run@gpu-cluster-1:~/mvapich2-2.1rc1/examples$ mpirun_rsh -ssh - np 2 -hostfile hostfile ./cpi Process 1 of 2 is on gpu-cluster-4
-----------------卡在这里，不继续-------- -----------------
很长一段时间后，我按 Ctrl + C，它会显示：

^C[gpu-cluster-1:mpirun_rsh][signal_processor] Caught signal 2 ，杀死作业
run@gpu-cluster-1:~/mvapich2-2.1rc1/examples$ [gpu-cluster-4:mpispawn_1][read_size] Unexpected End-Of-File on file descriptor 6. MPI 进程死了？
[gpu-cluster-4:mpispawn_1][read_size] 文件描述符 6 上出现意外的文件结束。MPI 进程死了？
[gpu-cluster-4:mpispawn_1][handle_mt_peer] 读取 PMI 套接字时出错。MPI进程死了？
[gpu-cluster-4:mpispawn_1][report_error] connect() failed: Connection denied (111)
困惑了很久，你能帮我解决这个问题吗？

以下是 cpi 示例的代码：

code>

mpi mvapich2

2015-01-05T11:42:40.370

问题标签 [mvapich2]

Reference