4

我计划将我的生物信息学管道移动到蛇形,因为我当前的管道是多个脚本的集合,这些脚本越来越难以遵循。在教程和文档的基础上,snakemake 似乎是一个非常清晰和有趣的管道管理选项。但是,我不熟悉 Python,因为我主要使用 bash 和 R,所以 snakemake 似乎更难学习:我面临以下问题。

我有两个文件,sampleA_L001_R1_001.fastq.gz 和 sampleA_L001_R2_001.fastq.gz,它们放在同一个目录 sampleA 中。我想使用cat命令合并这些文件。这实际上是一个测试运行:在实际情况下,每个样本应该有八个单独的 FASTQ 文件,它们应该以类似的方式合并。非常简单的工作,但我的代码有问题。

snakemake --latency-wait 20 --snakefile /home/users/me/bin/snakefile.txt

rule mergeFastq:
    input:
        reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
        reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
    output:
        reads1='sampleA/sampleA_R1.fastq.gz',
        reads2='sampleA/sampleA_R2.fastq.gz'
    message:
        'Merging FASTQ files...'
    shell:
        'cat {input.reads1} > {output.reads1}'
        'cat {input.reads2} > {output.reads2}'

-------------------------------------------------------------

Provided cores: 1
Rules claiming more threads will be scaled down.
Job counts:
    count   jobs
    1   mergeFastq
    1

Job 0: Merging FASTQ files...

Waiting at most 20 seconds for missing files.
Error in job mergeFastq while creating output files sampleA_R1.fastq.gz, sampleA_R2.fastq.gz.
MissingOutputException in line 5 of /home/users/me/bin/snakefile.txt:
Missing files after 20 seconds:
sampleA_R1.fastq.gz
This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait.
Removing output files of failed job mergeFastq since they might be corrupted: sampleA_R2.fastq.gz
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message.

如您所见,我已经尝试了该--latency-wait选项,但没有成功。你有什么想法可能是我的问题的根源吗?文件路径正确,文件本身未损坏且正常。我在通配符方面也遇到了类似的问题,所以在snakemake基础知识中一定有一些我不理解的东西。

4

1 回答 1

4

问题出在 shell 语句中,它被连接到一个命令中,该命令生成一个文件“sampleA/sampleA_R1.fastq.gzcat”,这就是为什么snakemake 找不到正确的输出。例如,您可以使用此语法:

rule mergeFastq:
    input:
        reads1='sampleA/sampleA_L001_R1_001.fastq.gz',
        reads2='sampleA/sampleA_L001_R2_001.fastq.gz'
    output:
        reads1='sampleA/sampleA_R1.fastq.gz',
        reads2='sampleA/sampleA_R2.fastq.gz'
    message:
        'Merging FASTQ files...'
    shell:"""
        cat {input.reads1} > {output.reads1}
        cat {input.reads2} > {output.reads2}
    """

不需要选项latency-wait。

于 2017-06-09T15:01:46.620 回答