python-2.7 - 如何在 Condor 上运行 python 程序？

Question

我是 Condor 的新手，正在尝试在 Condor 上运行我的 Python 程序，但很难做到。我发现的所有教程都假设一个文件 Python 程序，但我的 Python 程序包含多个包和文件，并且还使用其他库，例如 numpy 和 scipy。在那种情况下，我怎样才能让 Condor 运行我的程序？我应该将程序转换为某种可执行文件吗？或者，有什么方法可以将 Python 源代码传输到 Condor 机器中并让 Condor 上的 Python 运行源代码？

谢谢，

score 1 · Accepted Answer

你的工作需要带上完整的 python 安装（包括 SciPy 和 NumPy）。这包括在本地目录中构建 python 安装（可能在交互式 HTCondor 作业中），在此本地 python 安装中安装您需要的任何库，然后创建作为 transfer_input_files 包含的安装的 tarball。你必须在你的工作中使用一个包装器脚本来解压你的 python 安装，并在运行你的 python 脚本之前将你的工作指向正确的 python 可执行文件。

这是一个集群对如何做到这一点的解释：http: //chtc.cs.wisc.edu/python-jobs.shtml

score 0 · Accepted Answer

顺便提一句。现在可以通过 HTCondor 在 Docker 容器中执行作业！

https://research.cs.wisc.edu/htcondor/HTCondorWeek2015/presentations/ThainG_Docker.pdf

使用 Docker 的替代方法（我不推荐，但不得不这样做，因为几年前，condor 不支持 Docker）是利用虚拟环境。我将通过指定一个所有 condor 节点都可以访问的文件夹来创建 Anaconda 虚拟环境。然后，在 condor 中运行的作业需要通过首先激活环境来为每个作业激活虚拟环境。

score 0 · Accepted Answer

tldr; 在顶部将condor的正确路径导入到你的python提交脚本中

我真的不明白 condor 是如何工作的，但似乎一旦我将正确的 python 路径放在当前环境的顶部，它就开始工作了。所以检查你的python命令在哪里：

(automl-meta-learning) miranda9~/automl-meta-learning $ which python
~/miniconda3/envs/automl-meta-learning/bin/python

然后将其复制粘贴到您的 python 提交脚本的顶部：

#!/home/miranda9/miniconda3/envs/automl-meta-learning/bin/python

我希望我可以将所有这些都包含在job.sub. 如果你知道怎么做，请告诉我。

如果我的提交脚本对您有帮助：

####################
#
# Experiments script
# Simple HTCondor submit description file
#
# reference: https://gitlab.engr.illinois.edu/Vision/vision-gpu-servers/-/wikis/HTCondor-user-guide#submit-jobs
#
# chmod a+x test_condor.py
# chmod a+x experiments_meta_model_optimization.py
# chmod a+x meta_learning_experiments_submission.py
# chmod a+x download_miniImagenet.py
#
# condor_submit -i
# condor_submit job.sub
#
####################

# Executable   = meta_learning_experiments_submission.py
# Executable = automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
# Executable = ~/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py
Executable = /home/miranda9/automl-meta-learning/automl-proj/experiments/meta_learning/meta_learning_experiments_submission.py

## Output Files
Log          = condor_job.$(CLUSTER).log.out
Output       = condor_job.$(CLUSTER).stdout.out
Error        = condor_job.$(CLUSTER).err.out

# Use this to make sure 1 gpu is available. The key words are case insensitive.
REquest_gpus = 1
# requirements = ((CUDADeviceName = "Tesla K40m")) && (TARGET.Arch == "X86_64") && (TARGET.OpSys == "LINUX") && (TARGET.Disk >= RequestDisk) && (TARGET.Memory >= RequestMemory) && (TARGET.Cpus >= RequestCpus) && (TARGET.gpus >= Requestgpus) && ((TARGET.FileSystemDomain == MY.FileSystemDomain) || (TARGET.HasFileTransfer))
# requirements = (CUDADeviceName == "Tesla K40m")
# requirements = (CUDADeviceName == "Quadro RTX 6000")
requirements = (CUDADeviceName != "Tesla K40m")

# Note: to use multiple CPUs instead of the default (one CPU), use request_cpus as well
Request_cpus = 8

# E-mail option
Notify_user = me@gmail.com
Notification = always

Environment = MY_CONDOR_JOB_ID= $(CLUSTER)

# "Queue" means add the setup until this line to the queue (needs to be at the end of script).
Queue

我说我使用 python 提交脚本，所以让我复制它的顶部：

#!/home/miranda9/miniconda3/envs/automl-meta-learning/bin/python

import torch
import torch.nn as nn
import torch.optim as optim
# import torch.functional as F
from torch.utils.tensorboard import SummaryWriter

我不提交带有参数的 bash 脚本，参数在我的 python 脚本中。我不知道如何使用 bash 所以这对我来说效果更好。

参考解决方案：https ://stackoverflow.com/a/64484025/1601580

python-2.7 - 如何在 Condor 上运行 python 程序？

3 回答 3

tldr; 在顶部将condor的正确路径导入到你的python提交脚本中

Related

Reference