2

我有以下带有检查点的蛇文件。我正在尝试运行 2 个样本(定义为 RUNS)。然而,每次我尝试我都会得到一个额外的变量。关于如何解决这个问题的任何想法?谢谢..

import os
from tempfile import TemporaryDirectory

configfile: "config/CONFIG.yaml"
DATA_DIR = config["data_dir"]
RESULTS_DIR = config["results_dir"]
DB_DIR=config["db_dir"]
RUNS=["S1_select", "S3_select"]
BARCODES=config["no_barcode"]


rule all:
    input: expand(os.path.join(RESULTS_DIR, "basecalled/{run}/{barcode}.fastq.gz"), run=RUNS, barcode=BARCODES)

checkpoint guppy_gpu_basecall:
    input: os.path.join(DATA_DIR, "multifast5/{run}")
    output: directory(os.path.join(RESULTS_DIR, "basecalled/{run}"))    #folder with many files
    log: os.path.join(RESULTS_DIR, "basecalled/{run}/basecalling")
    threads: config["guppy_gpu"]["cpu_threads"]
    shell:
        """
        run_guppy
        """

rule intermediate_basecalling:
    input: os.path.join(RESULTS_DIR, "basecalled/{run}/{i}.fastq.gz")
    output: os.path.join(RESULTS_DIR, "basecalled/{run}/no_nobarcode/{i}.fastq.gz")
    log: os.path.join(RESULTS_DIR, "basecalled/{run}/no_barcode_{i}")
    shell:
        """
        (date &&\
        ln -s {input} {output}  &&\
        date) 2> >(tee {log}.stderr) > >(tee {log}.stdout)
        """

def aggregate_dummy_basecalling(wildcards):
    checkpoint_output = checkpoints.guppy_gpu_basecall.get(**wildcards).output[0]
    return expand(os.path.join(RESULTS_DIR, "basecalled/{run}/no_nobarcode/{id}.fastq.gz"),
        run=wildcards.run,
        i=glob_wildcards(os.path.join(checkpoint_output, "{i}.fastq.gz")).i)

rule merge_individual_fastq_per_barcode:
    input: aggregate_dummy_basecalling
    output: os.path.join(RESULTS_DIR, "basecalled/{run}/{barcode}/{barcode}.fastq.gz")
    shell:
        """
        date
        cat $(find $(dirname {output}) -name "*.fastq.gz" | sort) > {output}
        touch {output}
        date
        """

我收到以下错误:

Missing input files for rule guppy_gpu_basecall:
data/multifast5/S1_select/no_barcode.fastq.gz

谢谢你的指点!

4

0 回答 0