我有几个类似于下面的文件,我正在尝试进行图像中提到的数字分析
>File Sample
attttttttttttttacgatgccgggggatgcggggaaatttccctctctctctcttcttctcgcgcgcg
aaaaaaaaaaaaaaagcgcggcggcgcggasasasasasasaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
我必须映射每个大小为 2 的子字符串,然后将其映射到 33 值以用于不同的 ptoperties,然后根据窗口大小 5 添加。
my %temp = (
aCount => {
aa =>2
}
cCount => {
aa => 0
}
);
我目前的实施包括如下,
while (<FILE>) {
my $line = $_;
chomp $line;
while ($line=~/(.{2})/og) {
$subStr = $1;
if (exists $temp{aCount}{$subStr}) {
push @{$temp{aCount_array}},$temp{aCount}{$subStr};
if (scalar(@{$temp{aCount_array}}) == $WINDOW_SIZE) {
my $sum = eval (join('+',@{$temp{aCount_array}}));
shift @{$temp{aCount_array}};
#Similar approach has been taken to other 33 rules
}
}
if (exists $temp{cCount}{$subStr}) {
#similar approach
}
$line =~s/.{1}//og;
}
}
有没有其他方法可以提高整个过程的速度