我有一个 HoHoA 设置如下:
#!/usr/bin/perl
use warnings;
use strict;
my %experiment = (
'gene1' => {
'condition2' => ['XLOC_000347','80', '0.5'],
'condition3' => ['XLOC_000100', '50', '0.2']
},
'gene2' => {
'condition1' => ['XLOC_025437', '100', '0.018'],
'condition2' => ['XLOC_000322', '77', '0.22'],
'condition3' => ['XLOC_001000', '43', '0.02']
},
'gene3' => {
'condition1' => ['XLOC_025437', '100', '0.018'],
'condition3' => ['XLOC_001045', '23', '0.0001']
},
'gene4' => {
'condition3' => ['XLOC_091345', '93', '0.005']
}
);
我试图找到在至少 2 个条件下重叠的所有“基因”,并且对于每个基因,打印出具有最低值的条件(例如 q_value)。然后我想按这个值排序。到目前为止,这是我的代码:
循环遍历第一个键,以查找出现在第二个键中至少 2 个中的所有键。
my(%overlap, %condition_name);
my ($xloc, $q_val, $percentage, %seen);
for my $gene (sort keys %experiment) {
for my $condition (sort keys %{$experiment{$gene}}) {
$condition_name{$condition} = 1;
$seen{$gene}++; # Counts for each occurrence of gene
$overlap{$gene} = 1 if $seen{$gene} > 1;
}
}
对于每个重叠实例,打印出找到 key1 的每个条件 (key2) 以及相关值:
my @cond_name = keys %condition_name;
foreach my $gene (keys %overlap){
foreach my $condition (@cond_name){
next unless exists $experiment{$gene}{$condition};
($xloc, $percentage, $q_val) = @{$experiment{$gene}{$condition}};
print "$condition\t$gene\t$xloc\t$q_val\t$percentage\n";
}
print "\n";
}
输出:
condition3 gene3 XLOC_001045 0.0001 23
condition1 gene3 XLOC_025437 0.018 100
condition3 gene1 XLOC_000100 0.2 50
condition2 gene1 XLOC_000347 0.5 80
condition3 gene2 XLOC_001000 0.02 43
condition1 gene2 XLOC_025437 0.018 100
condition2 gene2 XLOC_000322 0.22 77
我正在尝试以两种方式更改输出:
- 对于 key1 的每个重叠实例,根据其中一个值比较每个第二个键。例如,对于
gene 1
,我想比较condition3
和condition2
(q_value) 的第一个值并只保留最低值。
期望的输出:
condition3 gene3 XLOC_001045 0.0001 23
condition3 gene1 XLOC_000100 0.2 50
condition1 gene2 XLOC_025437 0.018 100
- 其次,我想按我在 (q_value) 上选择的相同值对其进行排序,以给出:
所需的最终输出(见下面的更新):
condition3 gene3 XLOC_001045 0.0001 23
condition1 gene2 XLOC_025437 0.018 100
condition3 gene1 XLOC_000100 0.2 50
更新:16.9.13
我已经开始在这个问题上悬赏,因为答案(虽然很好)并没有完全达到我的期望。如果需要对问题进行任何澄清,请告诉我...
我的最终期望输出也略有变化:如上所述,我想比较每个条件的一个值,并根据该值对基因进行排序。理想情况下,我想为每个排序的基因输出每个条件(并在内部按相同的值排序):
condition3 gene3 XLOC_001045 0.0001 23 # Lowest q_value
condition1 gene3 XLOC_025437 0.018 100 # Other condition(s) for the gene with lowest q_value...
condition1 gene2 XLOC_025437 0.018 100 # For each gene, rank by q_value
condition3 gene2 XLOC_001000 0.02 43
condition2 gene2 XLOC_000322 0.22 77
condition3 gene1 XLOC_000100 0.2 50
condition2 gene1 XLOC_000347 0.5 80