2

我想查看我最近在集群上运行的所有作业(已完成、失败和正在运行)。我还希望看到每个工作有 1 个条目。执行sacct每个作业重新运行 3 行,使用State: FAILED, FAILED, COMPLETED. 这是什么意思?如何查看我想查看的实际信息?

我也不明白 a JobNameof是什么true意思。

这是输出的副本:

   JobID    JobName  Partition    Account  AllocCPUS      State ExitCode 
 ------------ ---------- ---------- ---------- ---------- ---------- -------- 
 2160852               R   interact cluster_u+          2  COMPLETED      0:0 
 2160864               R   interact cluster_u+          2  COMPLETED      0:0 
 2161424               R   interact cluster_u+          2  COMPLETED      0:0 
 2161430               R   interact cluster_u+          0 CANCELLED+      0:0 
 2161431               R   interact cluster_u+          2  COMPLETED      0:0 
 2161668               R   interact cluster_u+          2  COMPLETED      0:9 
 2161682          myjob+    general cluster_u+          2     FAILED      1:0 
 2161682.bat+      batch            cluster_u+          1     FAILED      1:0 
 2161682.0          true            cluster_u+          1  COMPLETED      0:0 
 2161683          myjob+    general cluster_u+          2     FAILED      1:0 
 2161683.bat+      batch            cluster_u+          1     FAILED      1:0 
 2161683.0          true            cluster_u+          1  COMPLETED      0:0 

提交脚本(注意 <% %> 中的值由 R 中的包 BatchJobs 处理):

 #!/bin/bash
 #SBATCH -J <%= job.name %>            # name of the job
 #SBATCH -p general
 #SBATCH --mem <%= resources$memory %>    # Memory requirements in Kbytes
 #SBATCH -o ./logs/<%= job.name %>_log.txt    # Memory requirements in Kbytes


 eval "R --vanilla --slave < <%= rscript %>"
4

1 回答 1

3

sacct 将为每个作业打印一行,然后在该作业中每个作业步骤打印一行。

 2161683          myjob+    general cluster_u+          2     FAILED      1:0  <- the job
 2161683.bat+      batch            cluster_u+          1     FAILED      1:0  <- the batch script
 2161683.0          true            cluster_u+          1  COMPLETED      0:0  <- the R step

该作业的状态为失败,因为脚本本身的状态为失败。您的脚本中有一个作业步骤,并且它正确终止。

我经常求助于 sacct |grep -v "^[0-9]*\."只获取工作信息。

于 2013-10-11T21:16:48.333 回答