0

这是我用于重新提交 mpi 作业的作业脚本。我有这个脚本最初是为 tcsh shell 编写的。我试图为 bash shell 重写它,但我得到了错误。请帮助我更正脚本。

##============================================================================

#!/bin/bash                                                                                                                                                                                            
#PBS -l mem=10GB                                                                                                                                
#PBS -l walltime=12:00:00                                                                                                                       
#PBS -l nodes=2:ppn=6                                                                                                                                                                                                                                             
#PBS -v NJOBS,NJOB

if [ X$NJOBS == X ]; then
    $ECHO "NJOBS (total number of jobs in sequence) is not set - defaulting to 1"
    export NJOBS=1
fi

if [ X$NJOB == X ]; then
    $ECHO "NJOB (current job number in sequence) is not set - defaulting to 1"
    export NJOB=1
fi

#                                                                                                                                               
# Quick termination of job sequence - look for a specific file                                                                                  
#                                                                                                                                               
if [ -f STOP_SEQUENCE ] ; then
    $ECHO  "Terminating sequence at job number $NJOB"
    exit 0
fi

#                                                                                                                                               
# Pre-job file manipulation goes here ...                                                                                                       
# =============================================================================                                                                                                                                              
# INSERT CODE             
# =============================================================================

module load openmpi/1.4.3

startnum= 0
x=1
i= $(($NJOB + $startnum - $x))
j= $(($i + $x))

$ECHO "This is job $i"
#$ECHO floobuks.$i.blah                                                                                                                         
#$ECHO flogwhilp.$j.txt                                                                                                                         


#===========================================================================
# actual execution code                                                                                                                    
#===========================================================================                 

# this is just a sample 
echo "job $i is followed by $j"

#=========================================================================== 
RUN COMPLETE
#===========================================================================

#
# Check the exit status
#
errstat=$?
if [ $errstat -ne 0 ]; then
    # A brief nap so PBS kills us in normal termination
    # If execution line above exceeded some limit we want PBS
    # to kill us hard
    sleep 5
    $ECHO "Job number $NJOB returned an error status $errstat - stopping job sequence."
    exit $errstat
fi

#
# Are we in an incomplete job sequence - more jobs to run ?
#
if [ $NJOB -lt $NJOBS ]; then


#
# Now increment counter and submit the next job
#
    NJOB=$(($NJOB+1))
    $ECHO "Submitting job number $NJOB in sequence of $NJOBS jobs"
    qsub recur2.bash
else
    $ECHO "Finished last job in sequence of $NJOBS jobs"
fi

#==============================================================================

运行时出现以下错误

qsub -v NJOBS=4  recur2.bash



ModuleCmd_Load.c(200):ERROR:105: Unable to locate a modulefile for 'openmpi/1.4.3'
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 115: 0: command not found
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 117: 0: command not found
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 118: 1: command not found
/home/nsubramanian/bin/gromacs_3.3.3/bin/grompp_mpi: error while loading shared libraries: libmpi.so.0: cannot open shared object file: No such\
 file or directory
/var/spool/PBS/mom_priv/jobs/1833549.epic.SC: line 128: mpirun: command not found

我能够找出openmpi的错误,但我不能。我不知道如何使它工作。

注意:请忽略行号,它与原始文件不同。

4

1 回答 1

1

你的系统上没有 openmpi/1.4.3 这样的模块;在这些行中

startnum= 0
i= $(($NJOB + $startnum - $x))
j= $(($i + $x))

等号后不应该有空格。

要找出这一点,您所要做的就是尝试在 bash shell 中逐行运行脚本。

于 2012-07-21T20:38:44.377 回答