1

I am parsing a large EMBL file (>1G) and convert it to a gff file. It has some entries are not matching the traditional embl formation thus cause the bioperl module to throw exceptions. My question is since entries with error are only small portion of total sequences and I want to continue the script and just ignore the exception for now. But the perl script was always stoped by exceptions.

I am under a linux OS and with perl version 5.8.8

my perl script

use strict;
use Bio::SeqIO;
use Bio::Tools::GFF;
use warnings;
use Try::Tiny;

open (E ,">","emblError.txt");

if (@ARGV != 1) {    die "USAGE: embl2gff.pl   > outputfile.\n"; }

my $in = Bio::SeqIO->new(-file=>$ARGV[0],-format=>'EMBL');
eval {
   while (my $seq = $in->next_seq) {
      for my $feat ($seq->top_SeqFeatures) {
          my $gffio = Bio::Tools::GFF->new(-gff_version => 3);
          print $feat->gff_string($gffio)."\n";
        }
    }
};
if ($@) {
    warn "Oh no! [$@]\n";
}

The error I got

Name "main::E" used only once: possible typo at embl2GFF3.pl line 7.

--------------------- WARNING ---------------------
MSG: exception while parsing location line [join(9174..9343,14214..14303)complement(9268..9363),complement(9140..9198),complement(8965..9034),complement(8751..8884),complement(8419..8535),complement(8232..8337),complement(7952..8149),complement(7256..7332),complement(7051..7175),complement(6769..6877),complement(6601..6659),complement(4690..6530))] in reading EMBL/GenBank/SwissProt, ignoring feature mRNA (seqid=XcouVSXmac70forkSpecies.Scaffold1050.final):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad operator 1: had multiple locations 2, should be SplitLocationI
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:472
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:210
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:204
STACK: Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/FTHelper.pm:133
STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/embl.pm:403
STACK: embl2GFF3.pl:14
-----------------------------------------------------------

---------------------------------------------------

--------------------- WARNING ---------------------
MSG: exception while parsing location line [join(14219..14303,14368..14513)complement(9140..9198),complement(8965..9034),complement(8751..8884),complement(8419..8535),complement(8232..8337),complement(7952..8149),complement(7256..7332),complement(7051..7175),complement(6769..6877),complement(6601..6659),complement(6461..6530))] in reading EMBL/GenBank/SwissProt, ignoring feature CDS (seqid=XcouVSXmac70forkSpecies.Scaffold1050.final):
------------- EXCEPTION: Bio::Root::Exception -------------
MSG: Bad operator 1: had multiple locations 2, should be SplitLocationI
STACK: Error::throw
STACK: Bio::Root::Root::throw /usr/lib/perl5/site_perl/5.8.8/Bio/Root/Root.pm:472
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:210
STACK: Bio::Factory::FTLocationFactory::from_string /usr/lib/perl5/site_perl/5.8.8/Bio/Factory/FTLocationFactory.pm:204
STACK: Bio::SeqIO::FTHelper::_generic_seqfeature /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/FTHelper.pm:133
STACK: Bio::SeqIO::embl::next_seq /usr/lib/perl5/site_perl/5.8.8/Bio/SeqIO/embl.pm:403
STACK: embl2GFF3.pl:14
-----------------------------------------------------------

---------------------------------------------------
Oh no! [Can't call method "isa" on an undefined value at /usr/lib/perl5/site_perl/5.8.8/Bio/Seq.pm line 1142, <GEN0> line 538764.
]

NOTE: I didn't post the exception twice, it just happen this way and only one exception seems to be caught .

Here is the block of embl file cause the problem. The mRNA entry causes the first exception and the CDS causes the second.

FT   mRNA            join(9174..9343,14214..14303)
FT                   complement(9268..9363),complement(9140..9198),
FT                   complement(8965..9034),complement(8751..8884),
FT                   complement(8419..8535),complement(8232..8337),
FT                   complement(7952..8149),complement(7256..7332),
FT                   complement(7051..7175),complement(6769..6877),
FT                   complement(6601..6659),complement(4690..6530))
FT                   /gene="ENSXMAG00000014948"
FT                   /note="transcript_id=ENSXMAT00000015030"
FT   CDS             join(14219..14303,14368..14513)
FT                   complement(9140..9198),complement(8965..9034),
FT                   complement(8751..8884),complement(8419..8535),
FT                   complement(8232..8337),complement(7952..8149),
FT                   complement(7256..7332),complement(7051..7175),
FT                   complement(6769..6877),complement(6601..6659),
FT                   complement(6461..6530))
FT                   /gene="ENSXMAG00000014948"
FT                   /protein_id="ENSXMAP00000015010"
FT                   /note="transcript_id=ENSXMAT00000015030"
FT                   /db_xref="HGNC_transcript_name:ENO3-201"
4

1 回答 1

6

eval不捕获低级 Perl 错误。还要检查$SIG{__DIE__}处理程序。如果一个 die-handler 写得不熟练,它可能就死了。例如,如果处理程序不检查$EXCEPTIONS_BEING_CAUGHT,它可能exit来自死处理程序。

但只要看看你的输出,如果它打印了这个:

Oh no! [Can't call method "isa" on an undefined value at 
/usr/lib/perl5/site_perl/5.8.8/Bio/Seq.pm line 1142, line 538764. ]

然后,它并没有按照您所说的那样做。您eval 正在捕获错误,否则您将无法"Oh no!"在前面打印它。看起来它自己也在做一些堆栈跟踪转储。

最后,您的程序状态看起来是依赖于数据的,并且文件中的一些错误值可能会将其置于错误状态。无论出于何种原因,它都无法创建一个BIO::Seq对象并将其传递给某个函数,该函数检查参数是否为isa其他内容。看起来您输入文件中的违规行是#538,764。但我可能是错的。

注意:在评论中解决您的问题。如果 Bioperl 正在处理它发现的错误,而您只想浏览一系列记录,那么我的建议是您将您的eval 放在循环中——要么是循环,要么是while循环for。对于某些多线程应用程序来说,这是一种非常标准的形式。

 while ( 1 ) {
     eval { $me->spin(); 1; } or say "WARNING: $@";
     # unless we are officially done, just get ready to
     # handle somebody causing an exception in our thread.
     last if $me->done; 
 }

如果可能,请记住将 放在eval要恢复处理的位置。

于 2013-04-03T19:42:35.153 回答