2

如何为两个列表中的每对值运行一个实例,然后一次只收集其中一个列表的输出?

例如,如果您运行此 Nextflow 脚本:

numbers = Channel
    .from(1..2)
    .into{numbers1; numbers2}

letters = Channel
    .from('A'..'B')

process p1 {
    input:
    each number from numbers1
    each letter from letters

    output:
    path "${number}${letter}.txt" into foo

    """
    echo "$number $letter" > ${number}${letter}.txt
    """
}

process p2 {
    input:
    path numberletters from foo.collect()
    each number from numbers2

    """
    for file in $numberletters; do
        cat \$file >> $baseDir/${number}.out
    done
    """
}

你得到两个输出文件(如预期的那样):1.out2.out. 每一个都包含相同的行集:

1 A
1 B
2 A
2 B

我怎样才能使它1.out只包含1 Aand 1 B,并且2.out只包含2 Aand 2 B即,通道仅根据输入收集输出并保持具有不同输入的单独.collect()实例?foop1letternumber

4

1 回答 1

1

一种解决方案是让您的第一个进程输出一个包含“数字”作为第一个元素的元组,然后调用groupTuple()将共享相同键的文件组合在一起:

numbers = Channel.of(1..2)
letters = Channel.of('A'..'B')


process p1 {

    input:
    tuple val(number), val(letter) from numbers.combine(letters)

    output:
    tuple val(number), path("${number}${letter}.txt") into foo

    """
    echo "${number} ${letter}" > "${number}${letter}.txt"
    """
}

process p2 {

    publishDir baseDir, mode: 'copy'

    input:
    tuple val(number), path(numberletters) from foo.groupTuple()

    output:
    path "${number}.out"

    """
    cat $numberletters > "${number}.out"
    """
}

如果您知道每个组中预期有多少元素,则可以设置“大小”属性以允许 groupTuple 运算符尽快流式传输收集的值。

于 2021-06-11T14:25:42.690 回答