I am parsing a large EMBL file (>1G) and convert it to a gff file. It has some entries are not matching the traditional embl formation thus cause the bioperl module to throw exceptions. My question is since entries with error are only small portion of total sequences and I want to continue the script and just ignore the exception for now. But the perl script was always stoped by exceptions.
I am under a linux OS and with perl version 5.8.8
my perl script
use strict;
use Bio::SeqIO;
use Bio::Tools::GFF;
use warnings;
use Try::Tiny;
open (E ,">","emblError.txt");
if (@ARGV != 1) { die "USAGE: embl2gff.pl > outputfile.\n"; }
my $in = Bio::SeqIO->new(-file=>$ARGV[0],-format=>'EMBL');
eval {
while (my $seq = $in->next_seq) {
for my $feat ($seq->top_SeqFeatures) {
my $gffio = Bio::Tools::GFF->new(-gff_version => 3);
print $feat->gff_string($gffio)."\n";
}
}
};
if ($@) {
warn "Oh no! [$@]\n";
}
The error I got
Name "main::E" used only once: possible typo at embl2GFF3.pl line 7.
--------------------- WARNING ---------------------
MSG: exception while parsing location line [join(9174..9343,14214..14303)complement(9268..9363),complement(9140..9198),complement(8965..9034),complement(8751..8884),complement(8419..8535),complement(8232..8337),complement(7952..8149),complement(7256..7332),complement(7051..7175),complement(6769..6877),complement(6601..6659),complement(4690..6530))] in reading EMBL/GenBank/SwissProt, ignoring feature mRNA (seqid=XcouVSXmac70forkSpecies.Scaffold1050.final):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad operator 1: had multiple locations 2, should be SplitLocationI
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:472
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:210
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:204
STACK: Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/FTHelper.pm:133
STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/embl.pm:403
STACK: embl2GFF3.pl:14
-----------------------------------------------------------
---------------------------------------------------
--------------------- WARNING ---------------------
MSG: exception while parsing location line [join(14219..14303,14368..14513)complement(9140..9198),complement(8965..9034),complement(8751..8884),complement(8419..8535),complement(8232..8337),complement(7952..8149),complement(7256..7332),complement(7051..7175),complement(6769..6877),complement(6601..6659),complement(6461..6530))] in reading EMBL/GenBank/SwissProt, ignoring feature CDS (seqid=XcouVSXmac70forkSpecies.Scaffold1050.final):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad operator 1: had multiple locations 2, should be SplitLocationI
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:472
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:210
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:204
STACK: Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/FTHelper.pm:133
STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/embl.pm:403
STACK: embl2GFF3.pl:14
-----------------------------------------------------------
---------------------------------------------------
Oh no! [Can't call method "isa" on an undefined value at /usr/lib/perl5/site_perl/5.8.8/Bio/Seq.pm line 1142, <GEN0> line 538764.
]
NOTE: I didn't post the exception twice, it just happen this way and only one exception seems to be caught .
Here is the block of embl file cause the problem. The mRNA entry causes the first exception and the CDS causes the second.
FT mRNA join(9174..9343,14214..14303)
FT complement(9268..9363),complement(9140..9198),
FT complement(8965..9034),complement(8751..8884),
FT complement(8419..8535),complement(8232..8337),
FT complement(7952..8149),complement(7256..7332),
FT complement(7051..7175),complement(6769..6877),
FT complement(6601..6659),complement(4690..6530))
FT /gene="ENSXMAG00000014948"
FT /note="transcript_id=ENSXMAT00000015030"
FT CDS join(14219..14303,14368..14513)
FT complement(9140..9198),complement(8965..9034),
FT complement(8751..8884),complement(8419..8535),
FT complement(8232..8337),complement(7952..8149),
FT complement(7256..7332),complement(7051..7175),
FT complement(6769..6877),complement(6601..6659),
FT complement(6461..6530))
FT /gene="ENSXMAG00000014948"
FT /protein_id="ENSXMAP00000015010"
FT /note="transcript_id=ENSXMAT00000015030"
FT /db_xref="HGNC_transcript_name:ENO3-201"