parallel-processing - 程序在 1 个节点上进行强扩展，使用 2 个节点大幅增加运行时间

Question

结果表明，当我将处理器数量从 2 个增加到 4 个再到 10 个时，运行时间每次都会减少，但是当我达到 20 个处理器时，运行时间会大大增加。每个节点都有两个 8 核处理器，所以我想将每个节点限制为 16 个 mpi 进程。我这样做正确吗？我认为问题可能与我的 sbatch 文件有关。尤其是当我从使用一个节点到两个节点时，运行时间会大幅增加。这是我的批处理文件：

#!/bin/bash -x
#SBATCH -J scalingstudy
#SBATCH --output=scalingstudy.%j.out
#SBATCH --error=scaling-err.%j.err
#SBATCH --time=03:00:00
#SBATCH --partition=partition_name
#SBATCH --mail-type=end
#SBATCH --mail-user=email@school.edu

#SBATCH -N 2
#SBATCH --ntasks-per-node=16

module load gcc/4.9.1_1
module load openmpi/1.8.1_1

mpic++ enhanced_version.cpp

mpirun -np 2 ./a.out 10000
mpirun -np 4 ./a.out 10000
mpirun -np 10 ./a.out 10000
mpirun -np 20 --bind-to core ./a.out 10000

mpirun -np 2 ./a.out 50000
mpirun -np 4 ./a.out 50000
mpirun -np 10 ./a.out 50000
mpirun -np 20 --bind-to core ./a.out 50000

mpirun -np 2 ./a.out 100000
mpirun -np 4 ./a.out 100000
mpirun -np 10 ./a.out 100000
mpirun -np 20 --bind-to core ./a.out 100000

mpirun -np 2 ./a.out 500000
mpirun -np 4 ./a.out 500000
mpirun -np 10 ./a.out 500000
mpirun -np 20 --bind-to core ./a.out 500000

mpirun -np 2 ./a.out 1000000
mpirun -np 4 ./a.out 1000000
mpirun -np 10 ./a.out 1000000
mpirun -np 20 --bind-to core ./a.out 1000000

parallel-processing - 程序在 1 个节点上进行强扩展，使用 2 个节点大幅增加运行时间

0 回答 0

Related

Reference