-3

我有一个 perl 脚本,但它仅在给出序列时计算分子量。但是我想计算fasta文件中蛋白质序列的分子量。

print "Enter the amino acid sequence:\n";  
$a = < STDIN > ; 
chomp($a);
my @a = ();
my $a = '';
$x = length($a); 
print "Length of sequence is : $x";
@a = split('', $a); 
$b = 0; 
my %data = ( 
    A=>71.09,  R=>16.19,  D=>114.11,  N=>115.09, 
    C=>103.15,  E=>129.12,  Q=>128.14,  G=>57.05, 
    H=>137.14,  I=>113.16,  L=>113.16,  K=>128.17, 
    M=>131.19,  F=>147.18,  P=>97.12,  S=>87.08, 
    T=>101.11,  W=>186.12,  Y=>163.18,  V=>99.14 
); 
foreach $i(@a) { 
    $b += $data{$i}; 
} 
$c = $b - (18 * ($x - 1)); 
print "\nThe molecular weight of the sequence is $c";             
4

1 回答 1

1

首先,您必须告诉我们 .fasta 文件的格式。据我所知,它们看起来像

>seq_ID_1 descriptions etc 
ASDGDSAHSAHASDFRHGSDHSDGEWTSHSDHDSHFSDGSGASGADGHHAH
ASDSADGDASHDASHSAREWAWGDASHASGASGASGSDGASDGDSAHSHAS
SFASGDASGDSSDFDSFSDFSD

>seq_ID_2 descriptions etc
ASDGDSAHSAHASDFRHGSDHSDGEWTSHSDHDSHFSDGSGASGADGHHAH
ASDSADGDASHDASHSAREWAWGDASHASGASGASG

如果我们建议您的代码工作正常并计算分子量,我们需要的只是读取 fasta 文件,解析它们并通过您的代码计算重量。听起来更容易。

#!/usr/bin/perl

use strict;
use warnings;
use Encode;


for my $file (@ARGV) {
    open my $fh, '<:encoding(UTF-8)', $file;
    my $input = join q{}, <$fh>; 
    close $fh;
    while ( $input =~ /^(>.*?)$([^>]*)/smxg ) {
        my $name = $1;
        my $seq = $2;
        $seq =~ s/\n//smxg;
        my $mass = calc_mass($seq);
        print "$name has mass $mass\n";
    }
}

sub calc_mass {
    my $a = shift;
    my @a = ();
    my $x = length $a;
    @a = split q{}, $a;
    my $b = 0;
    my %data = (
        A=>71.09,  R=>16.19,  D=>114.11,  N=>115.09,
        C=>103.15,  E=>129.12,  Q=>128.14,  G=>57.05,
        H=>137.14,  I=>113.16,  L=>113.16,  K=>128.17,
        M=>131.19,  F=>147.18,  P=>97.12,  S=>87.08,
        T=>101.11,  W=>186.12,  Y=>163.18,  V=>99.14
    );
    for my $i( @a ) {
        $b += $data{$i};
    }
    my $c = $b - (18 * ($x - 1));
    return $c;
}
于 2013-07-10T09:22:45.847 回答