0

我有一个 nextflow 过程,它为每个染色体生成多个块到一个通道中,比如说,imputation它看起来像,

chr1.imputed.chunk1.gen.gz chr1.imputed.chunk2.gen.gz chr1.imputed.chunk3.gen.gz 
chr1.imputed.chunk1.stats chr1.imputed.chunk2.stats chr1.imputed.chunk3.stats
chr1.imputed.chunk1.bgen chr1.imputed.chunk2.bgen chr1.imputed.chunk3.bgen
.....

每条染色体有很多块(22 条染色体)。对于要获取的每种类型的文件集,我如何有效地将它们合并到各自的染色体中,

chr1.imputed.merged.gen.gz
chr1.imputed.merged.stats
chr1.imputed.merged.bgen

获得合并输出后,我想删除所有块。有什么帮助吗?

生成这些块的实际代码是:

process imputation {
publishDir params.out, mode:'copy'
input:
tuple val(chrom),val(chunk_array),val(chunk_start),val(chunk_end),path(in_haps),path(refs),path(maps) from imp_ch
output:
tuple val("${chrom}"),path("${chrom}.*") into imputed
script:
def (haps,sample)=in_haps
def (haplotype, legend, samples)=refs
"""
impute4.1.2_r300.3 -g "${haps}" -h "${haplotype}" -l "${legend}" -m "${maps}" -o "${chrom}.step10.imputed.chunk${chunk_array}" -no_maf_align -o_gz -int "${chunk_start}" "${chunk_end}" -Ne 20000 -buffer 1000 -seed 54321

if [[ \$(gunzip -c "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" | head -c1 | wc -c) == "0" ]]
then 
 echo  "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" is empty
else
 qctool_v2.0.8_rhel -g "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" -snp-stats -osnp "${chrom}.step10.imputed.chunk${chunk_array}.snp.stats"
 qctool_v2.0.8_rhel -g "${chrom}.step10.imputed.chunk${chunk_array}.gen.gz" -og "${chrom}.step10.imputed.chunk${chunk_array}.bgen" -os "${chrom}.step10.imputed.chunk${chunk_array}.sample"
fi
 """
4

3 回答 3

1

您能否发布生成您显示的代码段的实际代码

不看你的代码,我建议你可以试试这个http://nextflow-io.github.io/patterns/index.html#_process_per_file_range

于 2021-05-31T02:43:48.277 回答
1

你有这个

output:
tuple val("${chrom}"),path("${chrom}.*") into imputed

使用之前的输出通道规范,您可能必须在下游执行类似的操作process

input:
tuple val(name), path(chr_files) from imputed

script:  
gen_files = chr_files.findAll { it.toString().endsWith('.gen.gz') }.sort()
stat_files = chr_files.findAll { it.toString().endsWith('.stats') }.sort()
"""
# try with echo first to see if you get what you want
echo ${gen_files.join(' ')} > ${name}_gen_fileList.txt
echo ${stat_files.join(' ')} > ${name}_stat_fileList.txt
"""

一旦你确定echo上面的打印是你所期望的,那么你可以在那个process块中做其他事情

于 2021-06-05T21:52:19.427 回答
0

显然,以下代码行解决了这个问题。

imputed.into{impute_bgen;impute_gen;impute_sample;impute_stat}
bgens=impute_bgen.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[0])}.groupTuple()
gens=impute_gen.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[1])}.groupTuple()
samples=impute_sample.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[2])}.groupTuple()
stats=impute_stat.groupTuple().transpose().map{chrom,bfiles -> tuple(chrom,bfiles[3])}.groupTuple()
于 2021-07-06T13:39:13.387 回答