ansys有问题。当我启动它时,它抱怨一些分区。我们正在使用 slurm。它是否抱怨运行作业的 slurm 分区?但 RDMA 听起来更像是一个硬盘分区。我有点困惑问题的原因是什么。访问 slurm 中的文件系统或不同的队列(分区)?以及如何解决它。有没有人遇到过这个错误,也许知道解决方案?
它在带有 NFS /home 和 NFS /opt(ansys 安装)和 BeeGFS /work 目录(用于模型等)的 slurm 集群上运行。
cfx5remote: Rank 0:35: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:35: MPI_Init_thread: pkey table:
cfx5remote: Rank 0:35: MPI_Init_thread: 0x8001
cfx5remote: Rank 0:35: MPI_Init_thread: 0x7fff
cfx5remote: Rank 0:25: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:25: MPI_Init_thread: pkey table:
cfx5remote: Rank 0:35: MPI_Init_thread: 0xffff
cfx5remote: Rank 0:25: MPI_Init_thread: 0x8001
cfx5remote: Rank 0:25: MPI_Init_thread: 0x7fff
cfx5remote: Rank 0:25: MPI_Init_thread: 0xffff
cfx5remote: Rank 0:25: MPI_Init_thread: ibv_get_pkey() failed
cfx5remote: Rank 0:21: MPI_Init_thread: multiple pkey found in partition key table, please choose one via MPI_IB_PKEY
cfx5remote: Rank 0:25: MPI_Init_thread: Can't initialize RDMA device