2

我正在尝试使用 nextflow 和 docker 运行 python 脚本。我正在使用 dockerfile(如下所示)来创建 docker 映像。Nextflow 脚本有一个简单的 python 脚本启动。问题是当我从 docker 容器中(在交互模式下)运行相同的 python 命令时,它工作正常。但是当我使用带有 docker 容器的 nextflow 启动它时,它会引发错误。

Dockerfile:

#!/usr/local/bin/docker
# -*- version: 20.10.2 -*-

############################################
## MULTI-STAGE CONTAINER CONFIGURATION ##
FROM python:3.6.2
RUN apt-get update && apt-get install -y \
    apt-transport-https \
    software-properties-common \
    unzip \
    curl
RUN wget -O- https://apt.corretto.aws/corretto.key | apt-key add - && \
    add-apt-repository 'deb https://apt.corretto.aws stable main' && \
    apt-get update && \
    apt-get install -y java-1.8.0-amazon-corretto-jdk


############################################
## PHEKNOWLATOR (PKT_KG) PROJECT SETTINGS ##
# create needed project directories
WORKDIR /PKT
RUN mkdir -p /PKT
RUN mkdir -p /PKT/resources
RUN mkdir -p /PKT/resources/construction_approach
RUN mkdir -p /PKT/resources/edge_data
RUN mkdir -p /PKT/resources/knowledge_graphs
RUN mkdir -p /PKT/resources/node_data
RUN mkdir -p /PKT/resources/ontologies
RUN mkdir -p /PKT/resources/processed_data
RUN mkdir -p /PKT/resources/relations_data

# copy scripts/files needed to run pkt_kg
COPY pkt_kg /PKT/pkt_kg
COPY Main.py /PKT
COPY setup.py /PKT
COPY README.rst /PKT
COPY resources /PKT/resources

# download and copy needed data
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/edge_source_list.txt && mv edge_source_list.txt resources/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/ontology_source_list.txt && mv ontology_source_list.txt resources/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/resource_info.txt && mv resource_info.txt resources/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/subclass_construction_map.pkl && mv subclass_construction_map.pkl resources/construction_approach/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/PheKnowLator_MergedOntologies.owl && mv PheKnowLator_MergedOntologies.owl resources/knowledge_graphs/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/node_metadata_dict.pkl && mv node_metadata_dict.pkl resources/node_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/DISEASE_MONDO_MAP.txt && mv DISEASE_MONDO_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/ENSEMBL_GENE_ENTREZ_GENE_MAP.txt && mv ENSEMBL_GENE_ENTREZ_GENE_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt && mv ENTREZ_GENE_PRO_ONTOLOGY_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/GENE_SYMBOL_ENSEMBL_TRANSCRIPT_MAP.txt && mv GENE_SYMBOL_ENSEMBL_TRANSCRIPT_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/HPA_GTEx_TISSUE_CELL_MAP.txt && mv HPA_GTEx_TISSUE_CELL_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/MESH_CHEBI_MAP.txt && mv MESH_CHEBI_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/PHENOTYPE_HPO_MAP.txt && mv PHENOTYPE_HPO_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/STRING_PRO_ONTOLOGY_MAP.txt && mv STRING_PRO_ONTOLOGY_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/UNIPROT_ACCESSION_PRO_ONTOLOGY_MAP.txt && mv UNIPROT_ACCESSION_PRO_ONTOLOGY_MAP.txt resources/processed_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/INVERSE_RELATIONS.txt && mv INVERSE_RELATIONS.txt resources/relations_data/
RUN curl -O https://storage.googleapis.com/pheknowlator/current_build/data/processed_data/RELATIONS_LABELS.txt && mv RELATIONS_LABELS.txt resources/relations_data/

# install needed python libraries
RUN pip install --upgrade pip setuptools
WORKDIR /PKT
RUN pip install .


############################################
## GLOBAL ENVRIONMENT SETTINGS ##
# copy files needed to run docker container
COPY entrypoint.sh /PKT

# update permissions for all files
RUN chmod -R 755 /PKT

# set OWlTools memory (set to a high value, system will only use available memory)
ENV OWLTOOLS_MEMORY=500g
RUN echo $OWLTOOLS_MEMORY

# set python envrionment encoding
RUN export PYTHONIOENCODING=utf-8

docker 镜像的名称——pkt:2.0.0

Nextflow 脚本:

process run_PKTBaseRun{

echo True

container 'pkt:2.0.0'
publishDir "${params.outDir}", mode: 'copy'

output:
file '*' into output_ch

script:
"""
which python
$PWD
pwd
python /PKT/Main.py --onts /PKT/resources/ontology_source_list.txt \
            --edg /PKT/resources/edge_source_list.txt \
            --res /PKT/resources/resource_info.txt \
            --out /PKT/resources/knowledge_graphs --app subclass --kg full --nde yes --rel yes --owl no
"""


}

现在当我执行:

nextflow run main.nf

然后这会给出与 glob.glob 模块相关的错误,因为它没有列出文件,因为它必须在 docker 容器中。

但是,当我在 docker 容器中简单地运行上面的 python 代码时,它会运行得无懈可击。

> docker run -it pkt:2.0.0 /bin/bash

/PKT> python Main.py --onts resources/ontology_source_list.txt \
            --edg resources/edge_source_list.txt \
            --res resources/resource_info.txt \
            --out resources/knowledge_graphs --app subclass --kg full --nde yes --rel yes --owl no

只有当我将 nextflow 与 docker 结合使用时,此代码才会引发错误。我确保使用的 python 是容器内的。

问题:

  1. 有什么想法/想法让它发挥作用吗?

有趣的是,
which python --> python inside the container
BUT
的输出,$PWD 的输出 --> nextflow 启动
的目录 pwd 的输出 --> nextflow 的工作目录

  1. 当我们在nextflow进程中添加容器时,nextflow进程内部的命令(run_PKTBaseRun)不就是从容器workdir运行的吗?所以pwd的值不应该是container workdir的值而不是nextflow workdir的值吗?

所有必需的文件都已添加到 docker 映像中。

  1. 有没有办法确保 nextflow 过程中脚本部分中的命令从 docker root/workdir 运行?

这个 nextflow 和 docker 的想法是最终使用 awscli 在 aws 批处理上运行它。但是在 aws batch 上运行它之前,要确保它在本地服务器上运行良好。

期待您的建议和想法。谢谢你。

4

1 回答 1

0

尝试转义\$PWD这将为您提供安装在 docker 中的 nextflow 进程 workdir。我很好奇您是否以其他方式解决了它?

尝试在 nextflow 流程脚本中运行它。

export pdir=\$PWD
echo \$pdir
于 2021-08-22T10:59:54.390 回答