我想使用 substr 函数来恢复序列中的一些核苷酸。这里我有这些序列的 FASTA 格式:
>dvex28051
AAAACAAAAACATTCGCTAGAAAGTAATCAGCTGGTCATTTATTTGAAATGTTAATGATATATTTCATGTTGCTAATTTTTTATGAAAAAAATCATTGCTTATTTAATTACTCTTGGTTCTTGACCAACTATAAAAGCATTGTTTAGTATCAAGTGTCCAGGTATCAGCAGTTTTGTTTGAAAACAAACTTTTATTCATGCAGTCAGTGGCGGATCCAGGTAGAGTGCAGAGGCAGCACCCTCCGTCAGAAAACCAAAAAAAGAAGAAATGAAAAATTATAAAAAAAATTTCTAAACGTTGGTGCACTTAAGTGTAGCAAAAAATTCCTGTTTAGATATTCAGTGGGGAGCGACACCTTTTGGGGCCTATAGCTTCAAATCTTACTTGGTGACCTAAAATCGCTTTTTCGTTGGATCTGCGAAAGCTAGAATTTGGTTGCTGCAAATCGAATCGGTGCATCAACTGCATCAATATCAACGATGTGGTGACTGGTGGTATATTTTGGGTTCGTGCAATGCTACATTTATTTCAATCATATTTCAAGGCAGAAAGGGAAAGAAAACATCAGGTCAAGACAGTGGCGTAGCGAGGGAAGGGGGGCATACGTCCCCGGGCGCAACACGATGTCTTTTTTTTTAATCATCTGCGAAATTCAGACATTTTTTAGAGACTAAATGAAACTATGGAAAACCGGGCCCTTATAAAAGTTGAGACCAAGTGAAAAACTGGGGATAAAACATGAAAATCGGGCTCCAAAAGAATGAGAGTCCGCCCTTGGTCTGTACCAGCATGATTTGAGCGCAAATTTCATTAAGCCCCCGGGCGCAAGACACTCACGCTACGCCCCTGGGTAAAGACAAACAGAGTAGTTTTTCTTATAAACACAAGCATGCACAAACAACATAAAAACAAAACACAGTTTTTTTTAAGACGATGTGCTGCGTGCACCCGCTCAATGTTTTTTTTTTTTTTTTATAGAAAAGCAAAACTTTGAAAGGTTAACGTCAACTCATTTTACAACAATTTGTGGCAAATGGTATCAAGGTATCAAGCAATTAACTAAATGTCTTCCACTAGAACGCAGAACACCATTTTGCAATTATTTATTTGATGTAAACCAGTGTGTTAGATCAAAATCACTTCGACGCCGTTTTTTGACTCCGTGAAAATCTTGGTATTCTTCTCGCATTGCATAATGATGGTTTGTTGAAATAAAATTAAACGCTTAACGTTCTTAAAATGAGCGCGATACTACTTTTCTTTGTAGATTTTCTGCATGCGCTCCTTTTAAGTTGATCCCGAGCTACAAACTTCTTTATGAACGTTTTGGATTTCTCCAAAATAAAGCCTGCAAGCAGTTTTCTAAAAACACCGCACCCCCCATTAGGAATTTCTAGATCCGCCCCTGCATACAGTATTTGTTAATTATTAAAACCAACCAGCAGCAATTGTTTATTCAATGACTATTAAACCAACCTGGATAGTGCGTTTGGTCTTGATTGAAGCGATTGCTGCATTGACGTCTTTCGGAACCACATCACC
>dvex294195
GAATCAGTGGAAAAGTCACAACGCAGCTTGCCGAATTACTGCAGATTCTTTACACTTTTTTTTCTACATTATCACTGTTTTGCTTAATTTTCAATTATAGAAATCAAAATTAATAACTGGTATGTAGTTGGTCGGTGCTTCGAGAAAGTAGCCTACTCAATGATTTCTCAGAATGTTACAGTACTTCAAAAAAACAGACTACCCATTTCAAAAAATATAAACCTAGTA
我想将散列的每个键与该表的命中列 (dvex\d++) 进行比较:
#Query Hit sense start end star_q end_q lenght_q # this line is informative don't make part of the code.
miRNA1 dvex28051 + 205 232 11 38 51
miRNA1 dvex202016 - 75 106 17 48 51
miRNA1 dvex294195 + 55 85 11 48 51
如果存在,我想将其哈希值分配给一个变量(即:$sequence)以应用 substr 函数:
my $fragment = substr $sequence, $start, $length_sequence;
我用序列制作了一个数组,并尝试读取每个 2 个值并进行比较:
while (my $line1 = <$MYINPUTFILE>){ #Entry of the sequences Fasta file
chomp $line1;
push @array_lines, $line1;
}
while (my $line2 = <$IN>){ #Entry of the table
chomp $line2;
push @database_lines, $line2;
}
foreach my $database_line (@database_lines){ #each value of the table
my @entry = split /\s++/,$database_line;
$pattern = $entry[1];
$query = $entry[0];
$start = $entry[3];
$l_pattern = length $pattern;
$end = $entry[4];
$lng_sequence = ($end - $start) + 1;
$sense = $entry[2];
$l_query = $entry[7];
my $count = 2;
for (my $i = 0; $i <= $#array_lines; $i +=$count){
chomp $array_lines[$i-2];
chomp $array_lines[$i-1];
$seq = $array_lines[$i-1];
$header = $array_lines[$i-2];
if($new_header =~ /$pattern/ && $l_header == $l_pattern){
if(($end+$right_diff+$increment) > $l_query){
$clean_seq = substr $seq, $start, $l_query;
} else {;}
}
我的代码的问题是 Perl 将 $seq 识别为最后一个序列。并且总是在这个 $seq 上应用 substr 函数。我需要搜索 $pattern 并在这些序列中搜索,如果存在,将 $seq 分配给它的序列,然后应用 substr 函数。一些建议?