nextflow - RNA-seq 示例工作流程中的 nextflow .collect() 方法

Question

我知道collect()当我们运行一个将两个通道作为输入的过程时，我们必须使用，其中第一个通道有一个元素，然后第二个通道有 > 1 个元素：


#! /usr/bin/env nextflow

nextflow.enable.dsl=2

process A {

    input:
    val(input1)

    output:
    path 'index.txt', emit: foo

    script:
    """
    echo 'This is an index' > index.txt
    """
}

process B {

    input:
    val(input1)
    path(input2)

    output:
    path("${input1}.txt")

    script:
    """
    cat <(echo ${input1}) ${input2} > \"${input1}.txt\"
    """
}

workflow {

    A( Channel.from( 'A' ) )

    // This would only run for one element of the first channel:
    B( Channel.from( 1, 2, 3 ), A.out.foo )

    // and this for all of them as intended:
    B( Channel.from( 1, 2, 3 ), A.out.foo.collect() )

}

现在的问题是：为什么 nextflow-io ( https://github.com/nextflow-io/rnaseq-nf/blob/master/modules/rnaseq.nf#L15 ) 的示例工作流中的这一行可以在不使用collect()or的情况下工作toList()？

同样的情况，一个元素（索引）的通道和一个大于 1 的通道（fastq 对）应该被同一个进程（quant）使用，并且它在所有 fastq 文件上运行。与我的虚拟示例相比，我缺少什么？

score 1 · Accepted Answer

您需要使用永远不会耗尽通道的值工厂创建第一个通道。

您的链接示例隐含地创建了一个价值渠道，这就是它起作用的原因。当您调用时也会发生同样的.collect()情况A.out.foo。

Channel.from（或更现代的Channel.of）创建一个可以用尽的序列通道，这就是为什么两者A都只B运行一次。

所以

A( Channel.value('A') )

是你所需要的全部。

nextflow - RNA-seq 示例工作流程中的 nextflow .collect() 方法

1 回答 1

Related

Reference