3

我正在重新设计一个工作流程,基本上它从一个产生多个其他流程的流程开始。最初,我在开始我的工作流程之前就有了变量,因此我制作了这些变量的元组,然后将其作为输入传递给流程。该进程获取每个值,并为元组中的每个值生成一个进程。

然而,在我的新架构中,我在我的 processA 中得到了“元组”。然后 processB 需要将每个值作为输入,并为每个输入生成一个进程。

我的元组看起来像:{"002--002": some_params, "004--004": some_params, etc.}

我目前将这些值作为 Python 中的列表:['052--052', '054--054', '055--055', '059--059', '060--060', '066--066']

我想知道如何解析这个 Python 列表,以继续传递一个参数并产生多个进程?

ProcessA 还创建诸如somefile_052--052.someextension- 之类的文件,我基本上想用正确的文件传递正确的变量。

任何帮助将不胜感激。

这是一些代码:

这是我需要操作的文件。我需要在变量旁边发送具有相同代码的所有文件。

> ls
out.barcoded.subreads.bam             out.subreads.060--060.bam.pbi         out.subreads.090--090.subreadset.xml  out.subreads.149--149.bam             out.subreads.192--192.bam.pbi         out.subreads.249--249.subreadset.xml  out.subreads.285--285.bam             out.subreads.321--321.bam.pbi         out.subreads.479--479.subreadset.xml
out.barcoded.subreads.bam.pbi         out.subreads.060--060.subreadset.xml  out.subreads.091--091.bam             out.subreads.149--149.bam.pbi         out.subreads.192--192.subreadset.xml  out.subreads.252--252.bam             out.subreads.285--285.bam.pbi         out.subreads.321--321.subreadset.xml  out.subreads.482--482.bam
out.barcoded.subreads.lima.counts     out.subreads.066--066.bam             out.subreads.091--091.bam.pbi         out.subreads.149--149.subreadset.xml  out.subreads.227--227.bam             out.subreads.252--252.bam.pbi         out.subreads.285--285.subreadset.xml  out.subreads.454--454.bam             out.subreads.482--482.bam.pbi
out.barcoded.subreads.lima.guess      out.subreads.066--066.bam.pbi         out.subreads.091--091.subreadset.xml  out.subreads.172--172.bam             out.subreads.227--227.bam.pbi         out.subreads.252--252.subreadset.xml  out.subreads.303--303.bam             out.subreads.454--454.bam.pbi         out.subreads.482--482.subreadset.xml
out.barcoded.subreads.lima.report     out.subreads.066--066.subreadset.xml  out.subreads.107--107.bam             out.subreads.172--172.bam.pbi         out.subreads.227--227.subreadset.xml  out.subreads.259--259.bam             out.subreads.303--303.bam.pbi         out.subreads.454--454.subreadset.xml  out.subreads.489--489.bam
out.barcoded.subreads.lima.summary    out.subreads.071--071.bam             out.subreads.107--107.bam.pbi         out.subreads.172--172.subreadset.xml  out.subreads.233--233.bam             out.subreads.259--259.bam.pbi         out.subreads.303--303.subreadset.xml  out.subreads.464--464.bam             out.subreads.489--489.bam.pbi
out.barcoded.subreads.subreadset.xml  out.subreads.071--071.bam.pbi         out.subreads.107--107.subreadset.xml  out.subreads.175--175.bam             out.subreads.233--233.bam.pbi         out.subreads.259--259.subreadset.xml  out.subreads.307--307.bam             out.subreads.464--464.bam.pbi         out.subreads.489--489.subreadset.xml
out.subreads.052--052.bam             out.subreads.071--071.subreadset.xml  out.subreads.112--112.bam             out.subreads.175--175.bam.pbi         out.subreads.233--233.subreadset.xml  out.subreads.261--261.bam             out.subreads.307--307.bam.pbi         out.subreads.464--464.subreadset.xml  out.subreads.494--494.bam
out.subreads.052--052.bam.pbi         out.subreads.082--082.bam             out.subreads.112--112.bam.pbi         out.subreads.175--175.subreadset.xml  out.subreads.235--235.bam             out.subreads.261--261.bam.pbi         out.subreads.307--307.subreadset.xml  out.subreads.468--468.bam             out.subreads.494--494.bam.pbi
out.subreads.052--052.subreadset.xml  out.subreads.082--082.bam.pbi         out.subreads.112--112.subreadset.xml  out.subreads.185--185.bam             out.subreads.235--235.bam.pbi         out.subreads.261--261.subreadset.xml  out.subreads.313--313.bam             out.subreads.468--468.bam.pbi         out.subreads.494--494.subreadset.xml
out.subreads.054--054.bam.pbi         out.subreads.082--082.subreadset.xml  out.subreads.113--113.bam             out.subreads.185--185.bam.pbi         out.subreads.235--235.subreadset.xml  out.subreads.264--264.bam             out.subreads.313--313.bam.pbi         out.subreads.468--468.subreadset.xml  out.subreads.bam
out.subreads.054--054.subreadset.xml  out.subreads.085--085.bam             out.subreads.113--113.bam.pbi         out.subreads.185--185.subreadset.xml  out.subreads.241--241.bam             out.subreads.264--264.bam.pbi         out.subreads.313--313.subreadset.xml  out.subreads.471--471.bam             out.subreads.bam.pbi
out.subreads.055--055.bam             out.subreads.085--085.bam.pbi         out.subreads.113--113.subreadset.xml  out.subreads.187--187.bam             out.subreads.241--241.bam.pbi         out.subreads.264--264.subreadset.xml  out.subreads.316--316.bam             out.subreads.471--471.bam.pbi         out.subreads.json
out.subreads.055--055.bam.pbi         out.subreads.085--085.subreadset.xml  out.subreads.125--125.bam             out.subreads.187--187.bam.pbi         out.subreads.241--241.subreadset.xml  out.subreads.265--265.bam             out.subreads.316--316.bam.pbi         out.subreads.471--471.subreadset.xml  out.subreads.lima.counts
out.subreads.055--055.subreadset.xml  out.subreads.088--088.bam             out.subreads.125--125.bam.pbi         out.subreads.187--187.subreadset.xml  out.subreads.245--245.bam             out.subreads.265--265.bam.pbi         out.subreads.316--316.subreadset.xml  out.subreads.473--473.bam             out.subreads.lima.guess
out.subreads.059--059.bam             out.subreads.088--088.bam.pbi         out.subreads.125--125.subreadset.xml  out.subreads.188--188.bam             out.subreads.245--245.bam.pbi         out.subreads.265--265.subreadset.xml  out.subreads.317--317.bam             out.subreads.473--473.bam.pbi         out.subreads.lima.report
out.subreads.059--059.bam.pbi         out.subreads.088--088.subreadset.xml  out.subreads.143--143.bam             out.subreads.188--188.bam.pbi         out.subreads.245--245.subreadset.xml  out.subreads.273--273.bam             out.subreads.317--317.bam.pbi         out.subreads.473--473.subreadset.xml  out.subreads.lima.summary
out.subreads.059--059.subreadset.xml  out.subreads.090--090.bam             out.subreads.143--143.bam.pbi         out.subreads.188--188.subreadset.xml  out.subreads.249--249.bam             out.subreads.273--273.bam.pbi         out.subreads.317--317.subreadset.xml  out.subreads.479--479.bam             out.subreads.subreadset.xml
out.subreads.060--060.bam             out.subreads.090--090.bam.pbi         out.subreads.143--143.subreadset.xml  out.subreads.192--192.bam             out.subreads.249--249.bam.pbi         out.subreads.273--273.subreadset.xml  out.subreads.321--321.bam             out.subreads.479--479.bam.pbi

所以我想发送这些文件和这个变量:059--059

out.subreads.059--059.bam
out.subreads.059--059.bam.pbi
out.subreads.059--059.subreadset.xml

目前我在工作流程中的代码是:

process procA{
    input:
    file bc_fasta from bc_fasta_chan

    output:
    set file("$analysis_config.cell/bam/out.subreads.*"), val("$analysis_config.cell/bam/out.subreads.*") into lima_out

    script:
    ```
    // run script to generate the above generated files
    ```
}

process procB{
    input:
    set file(bc_bam_file), val(bc_name) from lima_out.flatten()

    script:
    """
    ls
    echo ${bc_bam_file}
    """
}
4

1 回答 1

1

诀窍是能够以某种方式从文件名中提取分组变量,然后调用groupTuple。我刚刚使用了一个简单的正则表达式来获取这个变量,但如果需要,你可以实现一些更复杂的东西:

lima_out = Channel.fromPath( './files/out.subreads.*', relative: true )

subreads_pattern = ~/^out\.subreads\.(\d{3}--\d{3})\..*/

lima_out
    .flatten()
    .filter { it.name =~ subreads_pattern }
    .map { tuple( (it.name =~ subreads_pattern)[0][1], it ) }
    .groupTuple(size: 3, sort: true)
    .view()

结果:

[489--489, [out.subreads.489--489.bam, out.subreads.489--489.bam.pbi, out.subreads.489--489.subreadset.xml]]
[316--316, [out.subreads.316--316.bam, out.subreads.316--316.bam.pbi, out.subreads.316--316.subreadset.xml]]
...

这是我如何将这些值输入到流程中的示例。我对处理伴随文件(在这种情况下,我们有带有“.bam.pbi”扩展名的文件)的偏好是将它们与 BAM 文件一起保存。我只是为此使用一个元组。通过在我们的元组上调用first(),我们可以获得 BAM。这只是我的偏好。您可以在 pbi 伴随文件的输入元组中有一个单独的文件/路径变量,但您可能不需要在脚本块中引用它。

lima_out = Channel.fromPath( './files/out.subreads.*', relative: true )

subreads_pattern = ~/^out\.subreads\.(\d{3}--\d{3})\..*/

lima_out
    .flatten()
    .filter { it.name =~ subreads_pattern }
    .map { tuple( (it.name =~ subreads_pattern)[0][1], it ) }
    .groupTuple(size: 3, sort: true)
    .map { group_name, files -> tuple( group_name, files[2], files[0..1] ) }
    .set { subreads_ch }

process next_process {

    input:
    tuple val(group), path(subreadset), path(indexed_subreads) from subreads_ch

    """
    echo "subreadset XML: ${subreadset}"
    echo "subreads BAM: ${indexed_subreads.first()}"
    """
}
于 2021-01-06T15:44:19.003 回答