1

I'm using sbatch to submit my job.
Command line mpirun --version gives:

Intel(R) MPI library for Linux* OS, Version 5.0 Build 20140507
Copyright (C) 2003-2014, Intel Corporation. All rights reserved.

So I think I'm working with Intel mpi.
Following the instructions: submitting an MPI job using Intel MPI, I write my script like this:

#!/bin/bash
#SBATCH --ntask=4
#SBATCH -t 00:10:00

. ~/.bash_profile

module load intel
mpirun mycc

mycc is the executable I get after compiling source files with mpicc.
Then I use command sbatch -p partitionname -J myjob script.sh, my job failed with exitcode 127:0. The slurm-jobid.out file says that(leave aside the set locale warning):

/usr/share/Modules/init/sh: line 2: /usr/bin/modulecmd: No such file or directory /tmp/slurmd/job252624/slurm_scirpt: line 10: mpirun: command not found

But I have checked and /usr/bin/modulecmd file does exist.
Any suggestion is aprreciated.

Edit
I also asked the question here.

I have removed the source statement and module load one.
I tried to load the module on the log in node before submitting my job. But there is something wrong. It says that:

moduleCmd_Lad.c(204): Error: 105: Unable to locate a modulefile for 'intel'

I use module avail command to see what modules are available:

---------/usr/share/Modules/modulefiles-------------------

dot module-info mpich2-x86_64 use.won

module-cvs modules null

---------/etc/modulefiles---------------------------------

compat-openmpi-psm-x86_64 compat-openmpi-x86_64

Forgive me for the messy formatting.

Solved

The problem is finally solved. My final script.sh is like this:

#!/bin/bash
srun -p partitionname -n 4 -t 00:10:00 mycc

Then use command sbatch -p partitionname -J myjob script.sh to submit the job.

4

1 回答 1

2

显然 /usr/bin/modulecmd 并不存在于所有计算节点中。确保它存在于所有计算节点中,然后重试。

此外,如果 /home 由所有节点共享,则您不需要获取 bash_profile,因为默认情况下 Slurm 会将所有环境导出到作业。

于 2015-12-22T09:44:37.507 回答