我编写了一个脚本,我使用 mpi4py 在 python2.7 的 Ubuntu 14.04 LTS 机器上运行。这是开头的一个片段:
from mpi4py import MPI
comm = MPI.COMM_WORLD
rank = comm.Get_rank()
size = comm.Get_size()
print comm.Get_size()
如果我在我的旧电脑上运行,mpiexec -n 3 python2.7 foo.py
我会得到答案:
3
3
3
我最近开始将我的软件迁移到新的 Ubuntu 14.04 LTS 服务器。当我在那里运行相同的命令时,我得到了答案:
1
1
1
很明显,这里出了点问题,虽然我不确定在哪里看,因为我的 MPI 知识不足。我试图检查 MPI 版本并mpiexec --version
在旧计算机上运行返回:
HYDRA build details:
Version: 1.4.1p1
Release Date: Thu Sep 1 13:53:02 CDT 2011
CC: gcc
CXX: c++
F77: gfortran
F90: f95
Configure options: '--enable-shared' '--prefix=/opt/anaconda1anaconda2anaconda3' '--disable-option-checking' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= ' 'LIBS=-lrt -lpthread ' 'CPPFLAGS= -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpl/include -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpl/include -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/openpa/src -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/openpa/src -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/ch3/include -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/ch3/include -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/common/datatype -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/common/datatype -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/common/locks -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/common/locks -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/ch3/channels/nemesis/include -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/ch3/channels/nemesis/include -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/ch3/channels/nemesis/nemesis/include -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/ch3/channels/nemesis/nemesis/include -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/mpid/ch3/channels/nemesis/nemesis/utils/monitor -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/util/wrappers -I/home/ilan/aroot/work/mpich2-1.4.1p1/src/util/wrappers'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc plpa
Resource management kernels available: user slurm ll lsf sge pbs
Checkpointing libraries available:
Demux engines available: poll select
如果我在新计算机上运行它,我会得到答案:
mpiexec (OpenRTE) 1.6.5
Report bugs to http://www.open-mpi.org/community/help/
我是否在这里运行可能导致问题的不同 MPI 实现?我该怎么说?还是python端的问题?似乎正在启动三个进程,只是 python 还没有完全意识到。我意识到后者可能是由 mpi4py 和 mpiexec 使用不同的 MPI 实现引起的。
如果我which mpiexec
在任何一台机器上运行,它都会返回:
/home/pmj27/anaconda2/bin/mpiexec
运行mpi4py.get_config()
返回:
{'mpicxx': '/home/pmj27/anaconda2/bin/mpicxx', 'mpif77': '/home/pmj27/anaconda2/bin/mpif77', 'mpicc': '/home/pmj27/anaconda2/bin/mpicc', 'mpif90': '/home/pmj27/anaconda2/bin/mpif90'}