nextflow - 从下一个流中提取样本 ID fromPath()

Question

我是 nextflow 的新手，这是我想测试一个真正的工作的实践。

#!/usr/bin/env nextflow

params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns)
cns_ch.view()

这个脚本的输出是：

N E X T F L O W  ~  version 21.04.0
Launching `cnvkit_call.nf` [festering_wescoff] - revision: 886ab3cf13
/data1/deliver/phase2/CNVkit/002-002_L4_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/015-002_L4.SSHT89_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/004-005_L1_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/018-008_L1.SSHT31_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/003-002_L3_sorted_dedup.cns
/data1/deliver/phase2/CNVkit/002-004_L6_sorted_dedup.cns

这里002-002,015-002等004-005是样本 ID。我正在尝试编写一个简单的过程来输出诸如 ${sample.id}_sorted_dedup.calls.cns 之类的文件，但我不确定如何提取这些 id 并将其输出。

process cnvcalls {
    input:
    file(cns_file) from cns_ch

    output:
    file("${sample.id}_sorted_dedup.calls.cns") into cnscalls_ch

    script:
    """
    cnvkit.py call ${cns_file} -o ${sample.id}_sorted_dedup.calls.cns
    """
}

如何修改process cnvcalls以使其与 sample.id 一起使用？

score 1 · Accepted Answer

有很多方法可以从文件名中提取样本名称/ID。一种方法是在下划线上拆分并取第一个元素：

params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns)


process cnvcalls {

    input:
    path(cns_file) from cns_ch

    output:
    path("${sample_id}_sorted_dedup.calls.cns") into cnscalls_ch

    script:
    sample_id = cns_file.name.split('_')[0]

    """
    cnvkit.py call "${cns_file}" -o "${sample_id}_sorted_dedup.calls.cns"
    """
}

不过，我的偏好是使用元组在输入文件旁边输入示例名称/id：

params.cns = '/data1/deliver/phase2/CNVkit/*.cns'
cns_ch = Channel.fromPath(params.cns).map {
    tuple( it.name.split('_')[0], it )
}


process cnvcalls {

    input:
    tuple val(sample_id), path(cns_file) from cns_ch

    output:
    path "${sample_id}_sorted_dedup.calls.cns" into cnscalls_ch

    """
    cnvkit.py call "${cns_file}" -o "${sample_id}_sorted_dedup.calls.cns"
    """
}

nextflow - 从下一个流中提取样本 ID fromPath()

1 回答 1

Related

Reference