我的程序有这个输入:
miRNA127 dvex589433 131 154 - 24 87.5 atcgtaacgtatctcccacactta 32 55 98
miRNA32 dvex320240 61 83 - 23 86.9565217391304 cttctacaatggtactgtccatt 31 53 97
miRNA32 dvex623745 141 163 - 23 86.9565217391304 ggtttcttccacaatagtaattt 26 48 97
miRNA79 dvex468096 702 733 - 32 81.25 ttggttaaaaatttttttttttaattaaaaaa 6 37 55
miRNA79 dvex468096 717 743 + 27 81.4814814814815 aaaaaatttttaaccaaagaaaaaaat 13 39 55
miRNA79 dvex468096 694 718 - 25 84 tttttttaattaaaaaacaattttt 17 41 55
miRNA79 dvex468096 696 724 + 29 75.8620689655172 aaattgttttttaattaaaaaaaaaaatt 13 41 55
miRNA79 dvex219016 1103 1130 + 28 78.5714285714286 aaatttttgctaaaaaatacaaaaattt 14 41 55
miRNA79 dvex219016 3420 3446 + 27 77.7777777777778 aaaatattattaaataaataatgcaat 13 39 55
miRNA79 dvex219016 1384 1408 + 25 80 tttcgtgaaacaaaaaagtttggaa 21 45 55
miRNA79 dvex219016 4384 4424 + 25 80 tttcgtgaaacaaaaaagtttggaa 21 45 55
miRNA154 dvex573491 297 324 + 28 78.5714285714286 cagcttgattttaagcctatctgaaagc 23 50 76
miRNA154 dvex546562 232 259 + 28 78.5714285714286 cagcttgattttaagcctatttgaaagc 23 50 76
miRNA154 dvex648254 147 172 + 26 80.7692307692308 aagcctacggagtgcgaggcagagct 47 72 76
miRNA154 dvex648254 277 303 + 26 80.7692307692308 aagcctacggagtgcgaggcagagct 47 72 76
如果具有相同的 $1、$2 和 $5 值,我需要分组。因此我决定使用具有不同嵌套数组的哈希:
$VAR1 = {
'miRNA79 dvex219016 +' => [
[ '1103', '1130', '14', '41', '55' ],
[ '3420', '3446', '13', '39', '55' ],
[ '1384', '1408', '21', '45', '55' ],
[ '4384', '4424', '21', '45', '55' ]
],
'miRNA79 dvex468096 +' => [
[ '717', '743', '13', '39', '55' ],
[ '696', '724', '13', '41', '55' ]
],
'miRNA154 dvex546562 +' => [ [ '232', '259', '23', '50', '76' ] ],
'miRNA79 dvex468096 -' => [
[ '702', '733', '6', '37', '55' ],
[ '694', '718', '17', '41', '55' ]
],
'miRNA154 dvex648254 +' => [
[ '147', '172', '47', '72', '76' ],
[ '277', '303', '47', '72', '76' ]
],
'miRNA127 dvex589433 -' => [ [ '131', '154', '32', '55', '98' ] ],
'miRNA154 dvex573491 +' => [ [ '297', '324', '23', '50', '76' ] ],
'miRNA32 dvex320240 -' => [ [ '61', '83', '31', '53', '97' ] ],
'miRNA32 dvex623745 -' => [ [ '141', '163', '26', '48', '97' ] ]
};
之后,我针对散列的每个键的嵌套数组的 [0]->[0] 值进行了组织。如果嵌套数组有 1 个数组,我会打印它。但是如果有 1< 我需要对它进行分组。接下来我展示一个分组示例:
'miRNA79 dvex468096 -' => [
[ '702', '733', '6', '37', '55' ],
[ '694', '718', '17', '41', '55' ]
],
整理一下:
$VAR1 = [ [ 696, '724', '13', '41', '55' ],
[ 717, '743', '13', '39', '55' ] ];
如果 [1][1] 和 [0][0] 之间的差异小于或等于 [0][4] 我需要将其组合并生成这个新数组:
$VAR1 = [ [ 696, '743', '13', '39', '55' ], ];
并打印出来。在这种情况下:
$VAR1 = [
[ 1103, '1130', '14', '41', '55' ],
[ 1384, '1408', '21', '45', '55' ],
[ 3420, '3446', '13', '39', '55' ],
[ 4384, '4424', '21', '45', '55' ]
];
评估 [1][1] 和 [0][0] 是否小于或等于 [0][4],FALSE,所以我需要提取第一个嵌套数组并打印它,然后再次迭代以评估最后一个健康)状况。如果它生成我需要组合的 TRUE 值,如果评估生成 FALSE 值,我需要提取第一个嵌套数组并打印它。接下来,我的代码:
#!/usr/bin/perl
use strict;
use warnings;
use Data::Dumper;
use List::Util qw/ min max /;
use List::Util qw(sum);
use Math::MatrixReal;
my %data;
my $val;
my $num;
my $start;
my $end;
my $diff;
my $start_q;
my $end_q;
my @new_data;
my @extract;
my @extract2;
my $limit;
while (<>) {
chomp;
my @fields = split;
push @{ $data{"@fields[0,1,4]"} }, [ @fields[ 2, 3, 8, 9, 10 ] ];
}
foreach my $key ( sort keys %data ) {
$val = $data{$key};
$num = scalar @$val;
next if $num == 0;
if ( $num == 1 ) { # print if the hash have 1 nested array
print
"$key\t $data{$key}[0][0]\t $data{$key}[0][1]\t $data{$key}[0][2]\t $data{$key}[0][3]\t $data{$key}[0][4]\n";
}
else {
foreach my $keys ( @$val[0] ) {
my @sorted = sort { $a->[0] <=> $b->[0] }
@$val; #organize the nested array values
$start = $sorted[0][0];
$end = $sorted[1][1];
$limit = $sorted[0][4];
$diff = $end - $start;
$start_q = $sorted[0][2];
$end_q = $sorted[1][3];
if ( $diff < $limit ) {
@new_data = ();
push( @new_data, $start );
push( @new_data, $end );
push( @new_data, $start_q );
push( @new_data, $end_q );
push( @new_data, $limit );
@extract = splice( @{ $sorted[0] }, 0, 5, @new_data );
@extract2 = splice( @{ $sorted[1] } );
}
else {
my @toprint = splice( @{ $sorted[0] } );
print
"$key\t$toprint[0]\t$toprint[1]\t$toprint[2]\t$toprint[3]\t$toprint[4]\n";
}
}
}
}
一般来说,我有这个结果:
miRNA127 dvex589433 - 131 154 32 55 98
miRNA154 dvex546562 + 232 259 23 50 76
miRNA154 dvex573491 + 297 324 23 50 76
miRNA154 dvex648254 + 147 172 47 72 76
miRNA32 dvex320240 - 61 83 31 53 97
miRNA32 dvex623745 - 141 163 26 48 97
miRNA79 dvex219016 + 1103 1130 14 41 55
但是在这些列表中,一些值没有出现,因为如果条件为 TRUE,我的代码不会迭代。一些建议?