0

我正在尝试运行 gcloud beta lifesciences,因为不推荐使用基因组 API。发生了很多变化,基因组学 API 与生命科学 API。

我使用 beta 生命科学在谷歌云中运行了我的分析步骤之一。这是我发现的。(1)通配符在命令行选项中不起作用(2)在命令行选项中设置目标目录并不容易,我使用env-var进行复制。

我现在正在尝试将命令行选项转换为 JSON 格式的管道文件,但在谷歌云中的帮助页面并不容易理解。您知道如何将以下选项转换为 JSON 文件,以便我可以使用更简单的选项运行它吗?

我在基因组学 API 中使用了 YAML 格式的管道文件,但 beta lifescienes 完全不同。

$ more step03_bwa_mem_genome1.run 
#SMALL=
SMALL=chr21.

LIFESCIENCESPATH=/gcloud-shared
#LIFESCIENCESPATH=/mnt
SCRIPTFILENAME=step03_bwa_mem_genome.sh
COHORTID=2_C_222

gcloud beta lifesciences pipelines run \
    --logging gs://${BUCKETID}/ExomeSeq/hResults/step03_bwa_mem_genome.${COHORTID}.log \
    --regions=asia-northeast1,asia-northeast2,asia-northeast3,asia-east1,asia-east2,asia-south1 \
    --boot-disk-size 20 \
    --preemptible \
    --machine-type n1-standard-1 \
    --disk-size "gcloud-shared:10" \
    --docker-image asia.gcr.io/thermal-shuttle-199104/centos8-essential-software-genomics-custom-python3:0.4 \
    --inputs REFERENCE1=gs://${BUCKETID}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.amb \
    --inputs REFERENCE2=gs://${BUCKETID}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.ann \
    --inputs REFERENCE3=gs://${BUCKETID}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.bwt \
    --inputs REFERENCE4=gs://${BUCKETID}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.fai \
    --inputs REFERENCE5=gs://${BUCKETID}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.intervals \
    --inputs REFERENCE6=gs://${BUCKETID}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.pac \
    --inputs REFERENCE7=gs://${BUCKETID}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.sa \
    --inputs SCRIPTFILE=gs://${BUCKETID}/ExomeSeq/${SCRIPTFILENAME} \
    --inputs COHORTID=${COHORTID} \
    --inputs SAMPLELIST=gs://${BUCKETID}/ExomeSeq/SAMPLELIST.${COHORTID}.lst \
    --inputs INPUTFILE1=gs://${BUCKETID}/ExomeSeq/hReads/${COHORTID}_01_1.chr21.fastq.gz \
    --inputs INPUTFILE2=gs://${BUCKETID}/ExomeSeq/hReads/${COHORTID}_01_2.chr21.fastq.gz \
    --inputs INPUTFILE3=gs://${BUCKETID}/ExomeSeq/hReads/${COHORTID}_02_1.chr21.fastq.gz \
    --inputs INPUTFILE4=gs://${BUCKETID}/ExomeSeq/hReads/${COHORTID}_02_2.chr21.fastq.gz \
    --inputs INPUTFILE5=gs://${BUCKETID}/ExomeSeq/hReads/${COHORTID}_03_1.chr21.fastq.gz \
    --inputs INPUTFILE6=gs://${BUCKETID}/ExomeSeq/hReads/${COHORTID}_03_2.chr21.fastq.gz \
    --outputs OUTPUTFILE1=gs://${BUCKETID}/ExomeSeq/hResults/${COHORTID}_01.bam \
    --outputs OUTPUTFILE2=gs://${BUCKETID}/ExomeSeq/hResults/${COHORTID}_02.bam \
    --outputs OUTPUTFILE3=gs://${BUCKETID}/ExomeSeq/hResults/${COHORTID}_03.bam \
    --env-vars REFERENCE1=${LIFESCIENCESPATH}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.amb,REFERENC
E2=${LIFESCIENCESPATH}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.ann,REFERENCE3=${LIFESCIENCESPATH}/
ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.bwt,REFERENCE4=${LIFESCIENCESPATH}/ExomeSeq/hReference/GRC
h38.primary_assembly.genome.${SMALL}fa.fai,REFERENCE5=${LIFESCIENCESPATH}/ExomeSeq/hReference/GRCh38.primary_assembly.ge
nome.${SMALL}fa.intervals,REFERENCE6=${LIFESCIENCESPATH}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.p
ac,REFERENCE7=${LIFESCIENCESPATH}/ExomeSeq/hReference/GRCh38.primary_assembly.genome.${SMALL}fa.sa,SCRIPTFILE=${LIFESCIE
NCESPATH}/ExomeSeq/${SCRIPTFILENAME},SAMPLELIST=${LIFESCIENCESPATH}/ExomeSeq/SAMPLELIST.${COHORTID}.lst,INPUTFILE1=${LIF
ESCIENCESPATH}/ExomeSeq/hReads/${COHORTID}_01_1.chr21.fastq.gz,INPUTFILE2=${LIFESCIENCESPATH}/ExomeSeq/hReads/${COHORTID
}_01_2.chr21.fastq.gz,INPUTFILE3=${LIFESCIENCESPATH}/ExomeSeq/hReads/${COHORTID}_02_1.chr21.fastq.gz,INPUTFILE4=${LIFESC
IENCESPATH}/ExomeSeq/hReads/${COHORTID}_02_2.chr21.fastq.gz,INPUTFILE5=${LIFESCIENCESPATH}/ExomeSeq/hReads/${COHORTID}_0
3_1.chr21.fastq.gz,INPUTFILE6=${LIFESCIENCESPATH}/ExomeSeq/hReads/${COHORTID}_03_2.chr21.fastq.gz,OUTPUTFILE1=${LIFESCIE
NCESPATH}/ExomeSeq/hResults/${COHORTID}_01.bam,OUTPUTFILE2=${LIFESCIENCESPATH}/ExomeSeq/hResults/${COHORTID}_02.bam,OUTP
UTFILE3=${LIFESCIENCESPATH}/ExomeSeq/hResults/${COHORTID}_03.bam \
    --command-line="find ${LIFESCIENCESPATH}; /bin/bash ${LIFESCIENCESPATH}/ExomeSeq/${SCRIPTFILENAME} ${COHORTID} 4"
4

2 回答 2

1

谢谢您的回答!

最后,我根据操作描述为 gcloud lifesciences 制作了一个输入 YML 文件。我需要了解 gcloud lifesciences 的基本功能,因为我想制作完整版本的基因组分析管道,从 FASTQ 到 snpEff/第三方注释/谷歌云中的文本提取(一天内全外显子组,五天内全基因组一套)。我已经使用基因组学 API 实现了它,但我正在尝试升级它以使用 gcloud lifesciences。

我也试过 gcsfuse,但设置谷歌云身份验证有点棘手。

谢谢,

$ more step03_bwa_mem_genome2.run 
#SMALL=
SMALL=chr21.

LIFESCIENCESPATH=/gcloud-shared
#LIFESCIENCESPATH=/mnt
SCRIPTFILENAME=step03_bwa_mem_genome.sh
COHORTID=2_C_222

gcloud beta lifesciences pipelines run \
    --logging gs://${BUCKETID}/ExomeSeq/hResults/step03_bwa_mem_genome.${COHORTID}.log \
    --pipeline-file step03_bwa_mem_genome1.yml


$ cat step03_bwa_mem_genome1.yml
actions:
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReads/2_C_222_03_2.chr21.fastq.gz ${INPUTFILE6}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReads/2_C_222_03_1.chr21.fastq.gz ${INPUTFILE5}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReads/2_C_222_02_2.chr21.fastq.gz ${INPUTFILE4}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReads/2_C_222_02_1.chr21.fastq.gz ${INPUTFILE3}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReads/2_C_222_01_2.chr21.fastq.gz ${INPUTFILE2}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReads/2_C_222_01_1.chr21.fastq.gz ${INPUTFILE1}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/SAMPLELIST.2_C_222.lst ${SAMPLELIST}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/step03_bwa_mem_genome.sh ${SCRIPTFILE}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.sa ${REFERENCE7}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.pac ${REFERENCE6}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.intervals ${REFERENCE5}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.fai ${REFERENCE4}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.bwt ${REFERENCE3}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.ann ${REFERENCE2}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp gs://genconv1/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.amb ${REFERENCE1}
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - -c
  - find /gcloud-shared; /bin/bash /gcloud-shared/ExomeSeq/step03_bwa_mem_genome.sh 2_C_222 4
  entrypoint: bash
  imageUri: asia.gcr.io/thermal-shuttle-199104/centos8-essential-software-genomics-custom-python3:0.4
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp ${OUTPUTFILE1} gs://genconv1/ExomeSeq/hResults/2_C_222_01.bam
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp ${OUTPUTFILE2} gs://genconv1/ExomeSeq/hResults/2_C_222_02.bam
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp ${OUTPUTFILE3} gs://genconv1/ExomeSeq/hResults/2_C_222_03.bam
  imageUri: google/cloud-sdk:slim
  mounts:
  - disk: gcloud-shared
    path: /gcloud-shared
- alwaysRun: true
  commands:
  - /bin/sh
  - -c
  - gsutil -m -q cp /google/logs/output gs://genconv1/ExomeSeq/hResults/step03_bwa_mem_genome.2_C_222.log
  imageUri: google/cloud-sdk:slim
environment:
  COHORTID: 2_C_222
  INPUTFILE1: /gcloud-shared/ExomeSeq/hReads/2_C_222_01_1.chr21.fastq.gz
  INPUTFILE2: /gcloud-shared/ExomeSeq/hReads/2_C_222_01_2.chr21.fastq.gz
  INPUTFILE3: /gcloud-shared/ExomeSeq/hReads/2_C_222_02_1.chr21.fastq.gz
  INPUTFILE4: /gcloud-shared/ExomeSeq/hReads/2_C_222_02_2.chr21.fastq.gz
  INPUTFILE5: /gcloud-shared/ExomeSeq/hReads/2_C_222_03_1.chr21.fastq.gz
  INPUTFILE6: /gcloud-shared/ExomeSeq/hReads/2_C_222_03_2.chr21.fastq.gz
  OUTPUTFILE1: /gcloud-shared/ExomeSeq/hResults/2_C_222_01.bam
  OUTPUTFILE2: /gcloud-shared/ExomeSeq/hResults/2_C_222_02.bam
  OUTPUTFILE3: /gcloud-shared/ExomeSeq/hResults/2_C_222_03.bam
  REFERENCE1: /gcloud-shared/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.amb
  REFERENCE2: /gcloud-shared/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.ann
  REFERENCE3: /gcloud-shared/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.bwt
  REFERENCE4: /gcloud-shared/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.fai
  REFERENCE5: /gcloud-shared/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.intervals
  REFERENCE6: /gcloud-shared/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.pac
  REFERENCE7: /gcloud-shared/ExomeSeq/hReference/GRCh38.primary_assembly.genome.chr21.fa.sa
  SAMPLELIST: /gcloud-shared/ExomeSeq/SAMPLELIST.2_C_222.lst
  SCRIPTFILE: /gcloud-shared/ExomeSeq/step03_bwa_mem_genome.sh
resources:
  regions:
  - asia-northeast1
  - asia-northeast2
  - asia-northeast3
  - asia-east1
  - asia-east2
  - asia-south1
  virtualMachine:
    bootDiskSizeGb: 20
    disks:
    - name: gcloud-shared
      sizeGb: 10
    machineType: n1-standard-1
    preemptible: true


于 2021-02-25T22:59:22.090 回答
0

我通常会推荐使用 Cromwell、Nextflow 或 Snakemake 之类的东西,而不是直接使用任何一个 API。它们为这些类型的任务提供了更多内置功能。

但是,输出gcloud beta lifesciences operations describe <operation name>将包括 gcloud 创建的管道定义,可以用作起点。您会注意到的一件事是--inputs自动--outputs创建环境变量,因此不需要LIFESCIENCESPATH变量和--env-vars参数,这将显着简化命令行。

于 2021-02-23T13:18:55.520 回答