1

I'm reasonably new to perl and very new to BioPerl, so my apologies if this seems like a trivial question. I'm using Bio::AlignIO and Bio::SimpleAlign to generate pairwise alignments of sequences of interest to a reference sequence - in this case the human mtDNA reference sequence. I'm able to generate the alignment in fasta format without issue, however I wish to record the differences between the reference and the sequences of interest in a tab delimited file for use with the webtool Haplogrep (http://haplogrep.uibk.ac.at/index.html). I can imagine doing this as loop whereby I go through each base of the reference sequence and say if the position in the reference does not match the query sequence, record the position and the nucleotide present in the query sequence. However, I can't figure out how to do this from the Bio::AlignIO object generated as part of the alignment. Here is the code I have so far:

#!/usr/bin/perl

use strict;
use warnings;
use Bio::AlignIO;
use Bio::SimpleAlign;

my $ref = "rCRS.fas";

my $comp = shift @ARGV or die "please provide comparison sequence: $!\n";

my $comp_id;

if ($comp =~ /(\S*)\.fas*/) {
    $comp_id = $1;
}

my $CAT = "cat $ref ".$comp." > ".$comp.".tmp";
try_cmd($CAT);

my $ALN = "muscle -in ".$comp.".tmp -out ".$comp.".aln";
try_cmd($ALN);

my $str = Bio::AlignIO->new(
     -file => $comp.".aln",
    -format => "fasta",
    );

my $aln = $str->next_aln();

####Subroutine####

sub try_cmd {
    my $cmd = shift;
    my $status;
    $status = system($cmd);
    if ( $status ) {
    print STDERR ( "problem running $cmd\nReturned $status\n" );
    }
    else {
    print STDERR ( "Successfully ran $cmd\n" );
    }
}

So to give a brief example, if I had generated an alignment such as this one:

>Ref_seq   AATTTGGGCTACT 
>Query_seq AAATTCGGCTACA 

would then like to generate an output such as:

3A 6C 13A

Can anyone help me figure out how to do so? General comments on better/easier ways to do this are also greatly appreciated.

4

0 回答 0