我编写了一个 PERL 程序,它采用一个 excel 表(通过将扩展名从 .xls 更改为 .txt 来转换为一个文本文件)和一个序列文件作为其输入。excel 表包含序列文件中需要剪切并提取到第三个输出文件中的区域的起点和终点(以及匹配区域两侧的 70 个侧翼值)。大约有 300 个值。该程序读取每次需要剪切的序列的起点和终点,但它反复告诉我该值超出了输入文件的长度,而显然不是。我似乎无法解决这个问题
这是程序
use strict;
use warnings;
my $blast;
my $i;
my $idline;
my $sequence;
print "Enter Your BLAST result file name:\t";
chomp( $blast = <STDIN> ); # BLAST result file name
print "\n";
my $database;
print "Enter Your Gene list file name:\t";
chomp( $database = <STDIN> ); # sequence file
print "\n";
open IN, "$blast" or die "Can not open file $blast: $!";
my @ids = ();
my @seq_start = ();
my @seq_end = ();
while (<IN>) {
#spliting the result file based on each tab
my @feilds = split( "\t", $_ );
push( @ids, $feilds[0] ); #copying the name of sequence
#coping the 6th tab value of the result which is the start point of from where a value should be cut.
push( @seq_start, $feilds[6] );
#coping the 7th tab value of the result file which is the end point of a value should be cut.
push( @seq_end, $feilds[7] );
}
close IN;
open OUT, ">Result.fasta" or die "Can not open file $database: $!";
for ( $i = 0; $i <= $#ids; $i++ ) {
($sequence) = &block( $ids[$i] );
( $idline, $sequence ) = split( "\n", $sequence );
#extracting the sequence from the start point to the end point
my $seqlen = $seq_end[$i] - $seq_start[$i] - 1;
my $Nucleotides = substr( $sequence, $seq_start[$i], $seqlen ); #storing the extracted substring into $sequence
$Nucleotides =~ s/(.{1,60})/$1\n/gs;
print OUT "$idline\n";
print OUT "$Nucleotides\n";
}
print "\nExtraction Completed...";
sub block {
#block for id storage which is the first tab in the Blast output file.
my $id1 = shift;
print "$id1\n";
my $start = ();
open IN3, "$database" or die "Can not open file $database: $!";
my $blockseq = "";
while (<IN3>) {
if ( ( $_ =~ /^>/ ) && ($start) ) {
last;
}
if ( ( $_ !~ /^>/ ) && ($start) ) {
chomp;
$blockseq .= $_;
}
if (/^>$id1/) {
my $start = $. - 1;
my $blockseq .= $_;
}
}
close IN3;
return ($blockseq);
}
爆炸结果文件:http ://www.fileswap.com/dl/Ws7ehftejp/
序列文件:http ://www.fileswap.com/dl/lPwuGh2oKM/
错误
substr 在 Nucleotide_Extractor.pl 第 39 行的字符串之外。
在 Nucleotide_Extractor.pl 第 41 行使用未初始化值 $Nucleotides 替换 (s///)。
在连接 (.) 或 Nucleotide_Extractor.pl 第 44 行的字符串中使用未初始化的值 $Nucleotides。
非常感谢您的任何帮助,并始终邀请您提出疑问