snakemake - snakemake 脚本访问标准输入/标准输出以进行流处理

Question

对于 Snakemake 工作流程，我需要处理许多 BAM 文件中的标签，并希望通过脚本（使用 Snakemake 脚本：指令）来处理这些标签。我这样做的具体方法是使用pysam 流处理。

infile = pysam.AlignmentFile("-", "rb")
outfile = pysam.AlignmentFile("-", "wb", template=infile)

for s in infile:
    (flowcell, lane) = s.query_name.split(':')[0:2]
    rg_id = ".".join([flowcell, lane])
    s.set_tag('RG',rg_id,'Z')
    outfile.write(s)

该脚本独立运行良好，但我无法弄清楚如何通过snakemakescript指令集成它。我更喜欢这种方式来最小化 IO 和 RAM 的使用。

编辑：采用直接加载来修复 RG 标签。

# parameters passed from snakemake
bam_file = snakemake.input[0]
fixed_bam_file = snakemake.output[0]

bamfile  = pysam.AlignmentFile(bam_file, "rb")
fixed_bamfile = pysam.AlignmentFile(fixed_bam_file, "wb", template = bamfile)

for i, read in enumerate(bamfile.fetch()):
    (flowcell, lane) = read.query_name.split(':')[0:2]
    rg_id = ".".join([flowcell, lane])
    read.set_tag('RG',rg_id,'Z')
    fixed_bamfile.write(read)
    if not (i % 100000):
        print("Updated the read group for {} reads to {}".format(i, rg_id))

bamfile.close()
fixed_bamfile.close()

编辑：Snakemakesrun:和shell:指令设置workdir:目录，而script:指令相对于执行 Snakefile 的目录进行操作（保持一切整洁）。因此，将流处理器置于script:.

score 1 · Accepted Answer

使用shell代替script指令：

rule all:
    input:
        expand('{sample}_edited.bam'), sample=['a', 'b', 'c']


rule somename:
    input:
        '{sample}.bam'
    output:
        '{sample}_edited.bam'
    shell:
        '''
        cat {input} > python edit_bam.py > {output}
        '''

score 1 · Accepted Answer

@Krischan 看来您已经找到了解决方案，如果是这样，将其作为答案发布可能会很好。

或者，您可以使用该对象{workflow}获取 Snakefile 的目录，并从那里构建您的 python 脚本的路径。如果你的目录结构是：

./
├── Snakefile
├── data
│   └── sample.bam
└── scripts
    └── edit_bam.py

Snakefile 可能如下所示：

rule all:
    input:
        'test.tmp',

rule one:
    input:
        'sample.bam',
    output:
        'test.tmp',
    shell:
        r"""
        cat {input} \
        | {workflow.basedir}/scripts/edit_bam.py >  {output}
        """

执行与snakemake -d data ...

似乎该workflow对象没有记录，但请检查此线程有什么方法可以在 Snakefile 中获取 Snakefile 的完整路径？

snakemake - snakemake 脚本访问标准输入/标准输出以进行流处理

2 回答 2

Related

Reference