我有以下数据:
eya XLOC_000445_Change:10.3_q:0.003 atonal1
six XLOC_00099_Change:70.0_q:0.095 atonal1
six-eya XLOC_0234324_Change:19.8_q:0.05 atonal1
eya XLOC_00010_Change:6.5_q:0.22 c-myc
six XLOC_025437_Change:1.1_q:0.018 c-myc
six-eya XLOC_001045_Change:2.3_q:0.0001 c-myc
eya XLOC_000115_Change:7.3_q:0.03 ezrin
six XLOC_000001_Change:7.9_q:0.00006 ezrin
six-eya XLOC_0234322_Change:9.0_q:0.0225 ezrin
six-eya XLOC_091345_Change:9.3_q:0.005 slc12a2
eya XLOC_000445_Change:9.9_q:0.3 atonal1
six XLOC_00099_Change:7.0_q:0.95 atonal1
six-eya XLOC_0234324_Change:9.8_q:0.5 atonal1
并尝试按如下方式构建 HoHoA:
#!/usr/bin/perl
use warnings;
use strict;
方法 1:将数组值推送到 HoH:
while (<$input>) {
chomp;
push @xloc, $1 if ($_ =~ /(XLOC_\d+)/);
push @change_val, $1 if ($_ =~ /Change:(-?\d+\.\d+|-?inf)/);
push @q_value, $1 if ($_ =~ /q:(\d+\.\d+)/);
my @split = split('\t');
push @condition, $split[0];
push @gene, $split[2];
}
push @{ $experiment{$gene[$_]}{$condition[$_]} }, [ $xloc[$_], $change_val[$_], $q_value[$_] ] for 0 .. $#change_val;
方法 2:动态为 HoHoA 赋值:
while (<$input>) {
chomp;
my $xloc = $1 if ($_ =~ /(XLOC_\d+)/);
my $change = $1 if ($_ =~ /Change:(-?\d+\.\d+|-?inf)/);
my $q_value = $1 if ($_ =~ /q:(\d+\.\d+)/);
my @split = split('\t');
my $condition = $split[0];
my $gene = $split[2];
$experiment{$gene}{$condition} = [ $xloc, $change, $q_value ];
}
两者都工作正常 - 只要我得到我想要的数据结构。然而,只有第一种方法(推)确保作为重复存在的基因(在这种情况下atonal1
)在 HoHoA 中被表示两次。
我的下游代码最初是为了处理以第二种方式构建的 HoHoA,我终生无法弄清楚为什么两种方法在以下代码中的处理方式不同:
下游代码:
my (%change, %seen, $xloc, $change_val, $q_value);
for my $gene (sort keys %experiment) {
for my $condition (sort keys %{$experiment{$gene}}) {
$seen{$gene}++; # Counts for each occurrence of gene
if ( (not exists $change{$gene}) || (abs $change{$gene} < abs $experiment{$gene}{$condition}[1]) ) { # Has a larger change value
$change{$gene} = $experiment{$gene}{$condition}[1];
}
}
}
print Dumper \%change;
当我以任何一种方法运行上述代码时,我得到:
方法 1的输出:
$VAR1 = {
'atonal1' => [
'XLOC_0234324',
'9.8',
'0.5'
],
'c-myc' => undef,
'ezrin' => undef,
'slc12a2' => undef,
};
方法 2的输出:
$VAR1 = {
'atonal1' => '9.9', # i.e. the largest change value for each condition/gene
'c-myc' => '6.5',
'ezrin' => '9.0',
'slc12a2' => '9.3',
};
我想要的是:
$VAR1 = {
'atonal1' => [
'9.9',
'70.0' # This is the difference - i.e the both values are added to the hash `%change`
],
'c-myc' => '6.5',
'ezrin' => '9.0',
'slc12a2' => '9.3',
};
我不知道是什么造成了差异
更新
%experiment
在使用方法 1推送值之后,我将发布 Dumper 输出:
$VAR1 = {
'atonal1' => {
'eya' => [
[
'XLOC_000445',
'10.3',
'0.003'
],
[
'XLOC_000445',
'9.9',
'0.3'
]
],
'six' => [
[
'XLOC_00099',
'70.0',
'0.095'
],
[
'XLOC_00099',
'7.0',
'0.95'
]
],
'six-eya' => [
[
'XLOC_0234324',
'19.8',
'0.05'
],
[
'XLOC_0234324',
'9.8',
'0.5'
]
]
},
'c-myc' => {
'eya' => [
[
'XLOC_00010',
'6.5',
'0.22'
]
],
'six' => [
[
'XLOC_025437',
'1.1',
'0.018'
]
],
'six-eya' => [
[
'XLOC_001045',
'2.3',
'0.0001'
]
]
},
'ezrin' => {
'eya' => [
[
'XLOC_000115',
'7.3',
'0.03'
]
],
'six' => [
[
'XLOC_000001',
'7.9',
'0.00006'
]
],
'six-eya' => [
[
'XLOC_0234322',
'9.0',
'0.0225'
]
]
},
'slc12a2' => {
'six-eya' => [
[
'XLOC_091345',
'9.3',
'0.005'
]
]
},
};