0

我目前正在尝试破译lsb.events日志文件的内容,该文件由 Platform Computing “Platform Process Manager”(Flow Manager)版本 8.1 创建。

从各种文档来源中,我看到了 jStatus 变量的以下描述:

  • 4=运行
  • 32=JOB_STAT_EXIT
  • 64=作业_状态_完成

但是在 JOB_STATUS 条目中,还有 jStatus 值 2 和 192。 这些值代表什么?

将 SAS 标记为与此实现捆绑在一起。另外,我观察到在某些情况下,我们的 lsb.events 文件中的实际字段与根据上述文档应该出现的字段不一致。

4

2 回答 2

2

状态 2 表示处于 PSUSP 状态的作业,可以通过多种方式获得(例如,使用 -H 选项提交作业以阻止其调度)。

对于 192,答案是作业状态是一个位域。在这种情况下,设置了 2 位:

  • 64 = JOB_STAT_DONE
  • 128 = JOB_STAT_PDONE

JOB_STAT_PDONE 表示该作业定义了执行后脚本并且已成功完成。

作业状态位的有效值lsf/lsbatch.h在包含目录中 LSF 附带的文件中:<LSF_INSTALL_DIR>/<LSF_VERSION>/include/lsf/lsbatch.h

于 2015-01-12T19:07:09.903 回答
0

为了扩展,感谢@Squirrel,我们C:\LSF_7.0\7.0\include\lsf\lsbatch.h文件的相关内容是:

/**  * \addtogroup job_states job_states  * define job states  */ /*@{*/
#define JOB_STAT_NULL         0x00       /**< State null*/
#define JOB_STAT_PEND         0x01       /**< The job is pending, i.e., it 
                                            * has not been dispatched yet.*/
#define JOB_STAT_PSUSP        0x02       /**< The pending job was suspended by its
                                            * owner or the LSF system administrator.*/
#define JOB_STAT_RUN          0x04       /**< The job is running.*/
#define JOB_STAT_SSUSP        0x08       /**< The running job was suspended 
                                           * by the system because an execution 
                                           * host was overloaded or the queue run 
                                           * window closed. (see \ref lsb_queueinfo, 
                                           * \ref lsb_hostinfo, and lsb.queues.)
                                           */
#define JOB_STAT_USUSP        0x10       /**< The running job was suspended by its 
                                           * owner or the LSF system administrator.*/
#define JOB_STAT_EXIT         0x20       /**< The job has terminated with a non-zero
                                           * status - it may have been aborted due 
                                           * to an error in its execution, or 
                                           * killed by its owner or by the 
                                           * LSF system administrator.*/
#define JOB_STAT_DONE         0x40       /**< The job has terminated with status 0.*/
#define JOB_STAT_PDONE        (0x80)     /**< Post job process done successfully */
#define JOB_STAT_PERR         (0x100)    /**< Post job process has error */
#define JOB_STAT_WAIT         (0x200)    /**< Chunk job waiting its turn to exec */
#define JOB_STAT_RUNKWN       0x8000     /* Flag : Job status is UNKWN caused by 
                                          * losting contact with remote cluster */
#define JOB_STAT_UNKWN        0x10000    /**< The slave batch daemon (sbatchd) on 
                                          * the host on which the job is processed 
                                          * has lost contact with the master batch 
                                          * daemon (mbatchd).*/

再次以十进制表示:

0       JOB_STAT_NULL
1       JOB_STAT_PEND
2       JOB_STAT_PSUSP
4       JOB_STAT_RUN
8       JOB_STAT_SSUSP 
16      JOB_STAT_USUSP 
32      JOB_STAT_EXIT 
64      JOB_STAT_DONE
128     JOB_STAT_PDONE 
256     JOB_STAT_PERR 
512     JOB_STAT_WAIT
32768   JOB_STAT_RUNKWN 
65536   JOB_STAT_UNKWN
于 2015-01-13T16:14:19.423 回答