0

我正在使用 Snakemake 编写我的 RNA-seq 管道。当我编写rule fpkm从 bam 文件计算 fpkm 值的最后一部分时,我收到错误消息:

MissingInputException in line 3 of /root/s/r/snakemake/my_rnaseq_data/Snakefile:
Missing input files for rule all:
05_ft/wt2_transcript.gtf
05_ft/wt1_transcript.gtf
05_ft/wt2_gene.gtf
05_ft/epcr1_gene.gtf
05_ft/wt1_gene.gtf
05_ft/epcr2_transcript.gtf
05_ft/epcr1_transcript.gtf
05_ft/epcr2_gene.gtf

这是我的蛇文件:

SBT=["wt1","wt2","epcr1","epcr2"]

rule all:
    input:
        expand("02_clean/{nico}_1.paired.fq", nico=SBT),
        expand("02_clean/{nico}_2.paired.fq", nico=SBT),
        expand("03_align/{nico}.bam", nico=SBT),
        expand("04_exp/{nico}_count.txt", nico=SBT),
        expand("05_ft/{nico}_gene.gtf", nico=SBT),
        expand("05_ft/{nico}_transcript.gtf", nico=SBT)

rule trim:
    input:
        "01_raw/{nico}_1.fastq",
        "01_raw/{nico}_2.fastq"
    output:
        "02_clean/{nico}_1.paired.fq.gz",
        "02_clean/{nico}_1.unpaired.fq.gz",
        "02_clean/{nico}_2.paired.fq.gz",
        "02_clean/{nico}_2.unpaired.fq.gz",
    shell:
        "java -jar /software/Trimmomatic-0.36/trimmomatic-0.36.jar PE -threads 16 {input[0]} {input[1]} {output[0]} {output[1]} {output[2]} {output[3]} ILLUMINACLIP:/software/Trimmomatic-0.36/adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36 &"

rule gzip:
    input:
        "02_clean/{nico}_1.paired.fq.gz",
        "02_clean/{nico}_2.paired.fq.gz"
    output:
        "02_clean/{nico}_1.paired.fq",
        "02_clean/{nico}_2.paired.fq"
    run:
        shell("gzip -d {input[0]} > {output[0]}")
        shell("gzip -d {input[1]} > {output[1]}")

rule map:
    input:
        "02_clean/{nico}_1.paired.fq",
        "02_clean/{nico}_2.paired.fq"
    output:
        "03_align/{nico}.sam"
    log:
        "logs/map/{nico}.log"
    threads: 40
    shell:
        "hisat2 -p 20 --dta -x /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -1 {input[0]} -2 {input[1]} -S {output} >{log} 2>&1 &"

rule sort2bam:
    input:
        "03_align/{nico}.sam"
    output:
        "03_align/{nico}.bam"
    threads:30
    shell:
        "samtools sort -@ 20 -m 20G -o {output} {input} "

rule count:
    input:
        "03_align/{nico}.bam"
    output:
        "04_exp/{nico}_count.txt"
    shell:
        "featureCounts -T 10 -p -t exon -g gene_id -a /root/s/r/p/A_th/WT-Al_VS_WT-CK/genome/tair10.gtf -o {output} {input}"

rule fpkm:
    input:
        "03_align/{nico}.bam"
    output:
        "05_ft/{nico}_gene.gtf"
        "05_ft/{nico}_transcript.gtf"
    shell:
        "stringtie -e -p 30 -G /root/s/r/p/A_th/WT-Al_VS_WT-CK/index/tair10 -A {output[0]} -o {output[1]} {input}"

这是我的目录结构:

|-- 03_align
|   |-- epcr1.bam
|   |-- epcr1.sam
|   |-- epcr2.bam
|   |-- epcr2.sam
|   |-- wt1.bam
|   |-- wt1.sam
|   |-- wt2.bam
|   `-- wt2.sam
|-- 04_exp

在我添加“规则 fpkm”部分之前,当我运行 Snakefile 时,bam 文件就已经存在。

4

1 回答 1

1

错误是由于rule fpkm. 在没有逗号的情况下,python 将其视为多行字符串,因此将它们连接起来并将其视为一个长字符串05_ft/{nico}_gene.gtf05_ft/{nico}_transcript.gtf

rule fpkm:
    output:
        "05_ft/{nico}_gene.gtf",
        "05_ft/{nico}_transcript.gtf"
于 2019-04-28T03:41:54.507 回答