python - 使用不同参数两次运行相同规则的最佳方法

Question

我bcftools consensus用来从 vcf 文件中提取单倍型。给定输入文件：

A.sorted.bam
B.sorted.bam

创建以下输出文件：

A.hap1.fna
A.hap2.fna
B.hap1.fna
B.hap2.fna

我目前有两条规则可以做到这一点。它们的区别仅在于输出文件和 shell 命令中的数字 1 和 2。代码：

rule consensus1:
    input:
        vcf="variants/phased.vcf.gz",
        tbi="variants/phased.vcf.gz.tbi",
        bam="alignments/{sample}.sorted.bam"
    output:
        "haplotypes/{sample}.hap1.fna"
    params:
        sample="{sample}"
    shell:
        "bcftools consensus -i -s {params.sample} -H 1 -f {reference_file} {input.vcf} > {output}"

rule consensus2:
    input:
        vcf="variants/phased.vcf.gz",
        tbi="variants/phased.vcf.gz.tbi",
        bam="alignments/{sample}.sorted.bam"
    output:
        "haplotypes/{sample}.hap2.fna"
    params:
        sample="{sample}"
    shell:
        "bcftools consensus -i -s {params.sample} -H 2 -f {reference_file} {input.vcf} > {output}"

虽然这段代码有效，但似乎应该有一种更好、更 Pythonic 的方式来只使用一个规则来做到这一点。是否可以将其合并为一个规则，或者我目前的方法是最好的方法？

score 2 · Accepted Answer

对 .中的单倍型 1 和 2 使用通配符rule all。请参阅此处以了解有关通过以下方式添加目标的更多信息rule all

reference_file = "ref.txt"

rule all:
    input:
        expand("haplotypes/{sample}.hap{hap_no}.fna",
                   sample=["A", "B"], hap_no=["1", "2"])

rule consensus1:
    input:
        vcf="variants/phased.vcf.gz",
        tbi="variants/phased.vcf.gz.tbi",
        bam="alignments/{sample}.sorted.bam"
    output:
        "haplotypes/{sample}.hap{hap_no}.fna"
    params:
        sample="{sample}",
        hap_no="{hap_no}"
    shell:
        "bcftools consensus -i -s {params.sample} -H {params.hap_no} \
               -f {reference_file} {input.vcf} > {output}"

python - 使用不同参数两次运行相同规则的最佳方法

1 回答 1

Related

Reference