0

我对蛇形文件有这个规则。当我启动时,输入文件是从我的 yaml 文件中存在的所有输入中填充的。我希望为 bwa 的每个进程填充一个单位密钥。这里有规则和 Yaml 文件(不完整)和试运行结果。

rule bwa_mem:
    input:
        dt=expand("trim/{sample}/",sample=config['units']),
        forward_paired=expand("trim/{sample}/{sample}_forward_paired.fq.gz",sample=config['units']),
        reverse_paired=expand("trim/{sample}/{sample}_reverse_paired.fq.gz",sample=config['units']),
        forward_unpaired=expand("trim/{sample}/{sample}_forward_unpaired.fq.gz",sample=config['units']),
        reverse_unpaired=expand("trim/{sample}/{sample}_reverse_unpaired.fq.gz",sample=config['units']),

    output:
        temp("mapped_reads/sam/{unit}.sam")
    params:
        genome= config["reference"]['genome_fasta']
    log:
        "mapped_reads/log/{unit}_bwa_mem.log"
    benchmark:
        "benchmarks/bwa/mem/{unit}.txt"
    threads: 8
    shell:
        '/illumina/software/PROG2/bwa-0.7.15/bwa mem '\
                '-t {threads} {params.genome}  {input.forward_paired} {input.reverse_paired} {input.forward_unpaired} {input.reverse_unpaired} 2> {log} > {output}'

而这个yaml文件配置:

  'samples':
  '432':
  - '432_L001'
  - '432_L002'
  '433':
  - '433_L002'
  - '433_L001'
  '434':
  - '434_L001'
  - '434_L002'
  '435':
  - '435_L002'
  - '435_L001'
....
'units':
  '432_L001':
  - '/illumina/runs/FASTQ/RAW/432_CGATGT_L001_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/432_CGATGT_L001_R2_001.fastq.gz'
  '432_L002':
  - '/illumina/runs/FASTQ/RAW/432_CGATGT_L002_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/432_CGATGT_L002_R2_001.fastq.gz'
  '433_L001':
  - '/illumina/runs/FASTQ/RAW/433_CAGATC_L001_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/433_CAGATC_L001_R2_001.fastq.gz'
  '433_L002':
  - '/illumina/runs/FASTQ/RAW/433_CAGATC_L002_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/433_CAGATC_L002_R2_001.fastq.gz'
  '434_L001':
  - '/illumina/runs/FASTQ/RAW/434_GTGAAA_L001_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/434_GTGAAA_L001_R2_001.fastq.gz'
  '434_L002':
  - '/illumina/runs/FASTQ/RAW/434_GTGAAA_L002_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/434_GTGAAA_L002_R2_001.fastq.gz'
  '435_L001':
  - '/illumina/runs/FASTQ/RAW/435_ACAGTG_L001_R1_001.fastq.gz'
  - '/illumina/runs/FASTQ/RAW/435_ACAGTG_L001_R2_001.fastq.gz'

当我尝试运行时,他 bwa 命令给出了这个结果

rule bwa_mem:
    input: trim/432_L001/432_L001_reverse_unpaired.fq.gz, trim/432_L002/4
32_L002_reverse_unpaired.fq.gz, trim/433_L001/433_L001_reverse_unpaired.f
q.gz, trim/433_L002/433_L002_reverse_unpaired.fq.gz, trim/434_L001/434_L0
01_reverse_unpaired.fq.gz, trim/434_L002/434_L002_reverse_unpaired.fq.gz,
 trim/435_L001/435_L001_reverse_unpaired.fq.gz, trim/435_L002/435_L002_re
verse_unpaired.fq.gz, trim/436_L001/436_L001_reverse_unpaired.fq.gz, trim
/436_L002/436_L002_reverse_unpaired.fq.gz, trim/437_L001/437_L001_reverse
_unpaired.fq.gz, trim/437_L002/437_L002_reverse_unpaired.fq.gz, trim/438_
L003/438_L003_reverse_unpaired.fq.gz, trim/438_L004/438_L004_reverse_unpa
ired.fq.gz,  trim/lane1_L001/lane1_L
001_reverse_paired.fq.gz, trim/lane2_L002/lane2_L002_reverse_paired.fq.gz
, trim/lane8_L008/
    output: mapped_reads/sam/441_L004.sam
    log: mapped_reads/log/441_L004_bwa_mem.log
    jobid: 208
    benchmark: benchmarks/bwa/mem/441_L004.txt
    wildcards: unit=441_L004

对于单元上的任何元素,报告所有输入文件......我在哪里犯了错误?

4

1 回答 1

2

因此,您在这里所做的是通过扩展函数将所有这些文件定义为规则的输入文件。换句话说,您在此处执行聚合。您在这里真正想要的是只有特定样本的输入文件集。您只需不使用输入文件的扩展功能即可实现这一目标。没有理由在这里使用它。

我强烈建议阅读整个官方 Snakemake 教程,其中也涵盖了这类问题:http ://snakemake.readthedocs.io/en/stable/tutorial/tutorial.html

于 2017-08-09T16:39:04.153 回答