我安装了 Torque 4.2.6,因为它支持 GPU。
每当我提交作业时,它都不会被执行,它总是留在队列中。
我的脚本是
#!/bin/bash
#PBS -l nodes=1:ppn=1:gpus=1
#PBS -l walltime=00:30:00
#PBS -q batch
#PBS -o $HOME/out_$PBS_JOBID
#PBS -e $HOME/err_$PBS_JOBID
#PBS -j oe
#PBS -m bae
#PBS -V
echo PBS JOB id is $PBS_JOBID
echo PBS_NODEFILE is $PBS_NODEFILE
echo PBS_QUEUE is $PBS_QUEUE
cd $PBS_O_WORKDIR
echo `hostname`
date
./prefixsum
它抛出以下错误:
PBS_Server: LOG_ERROR::Unknown node (15064) in set_nodes, request failed, corrupt request
如果上面的脚本是错误的,谁能告诉如何编写脚本来在 GPU 上执行作业
编辑
我在 /var/log/messages 中发现了以下错误
PBS_Server: LOG_ERROR::Unknown node (15064) in set_nodes, request failed, corrupt request
PBS_Server: LOG_ERROR::node_spec, job requesting nodes that will never be available - spec = 1:ppn=1:gpus=1
PBS_Server: LOG_ERROR::node_spec, job requesting nodes that will never be available - spec = PºtÃ;
pbsnodes -a
node01
state = free
np = 32
ntype = cluster
status = rectime=1397191125,varattr=,jobs=,state=free,netload=10273233433,gres=,loadave=2.28,ncpus=32,physmem=132092224kb,availmem=180232352kb,totmem=197628216kb,idletime=148596,nusers=4,nsessions=12,sessions=3914 3918 3920 3945 3947 3971 13227 13989 14012 17037 28460 28766,uname=Linux node01 2.6.32-401.el6.rhbz988052_minimal.x86_64 #1 SMP Tue Jul 30 18:39:08 EDT 2013 x86_64,opsys=linux
mom_service_port = 15002
mom_manager_port = 15003
gpus = 1
gpu_status = gpu[0]=gpu_id=0000:84:00.0;gpu_product_name=Tesla K20m;gpu_display=Disabled;gpu_pci_device_id=102810DE;gpu_pci_location_id=0000:84:00.0;gpu_fan_speed=N/A;gpu_memory_total=4799 MB;gpu_memory_used=11 MB;gpu_mode=Exclusive_Thread;gpu_state=Unallocated;gpu_utilization=99 %;gpu_memory_utilization=6 %;gpu_ecc_mode=Enabled;gpu_single_bit_ecc_errors=0;gpu_double_bit_ecc_errors=0;gpu_temperature=29 C,driver_ver=319.49,timestamp=Fri Apr 11 10:08:44 2014