您也可以对这项任务使用音译,这可能比遍历每个碱基要快。
#!/usr/bin/env perl
use strict;
use warnings;
my $seq = 'ATCGATGCAATTCCGGAAAAAATTTTCCCGGGGGGGAAACCCGGGAAATTT';
my $count = ($seq =~ tr/Aa//);
print "A is seen $count times.\n";
或者,您可以只使用 BioPerl 来获取序列统计信息。
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::Tools::SeqStats;
my $seqobj = Bio::PrimarySeq->new( -seq => 'ATCGATGCAATTCCGGAAAAAATTTTCCCGGGGGGGAAACCCGGGAAATTT',
-alphabet => 'dna',
-id => 'test' );
my $seq_stats = Bio::Tools::SeqStats->new( -seq => $seqobj );
my $hash_ref = $seq_stats->count_monomers();
for my $base (sort keys %$hash_ref) {
print "Number of bases of type ", $base, " = ", $hash_ref->{$base},"\n";
}
输出:
Number of bases of type A = 16
Number of bases of type C = 10
Number of bases of type G = 14
Number of bases of type T = 11