我正在尝试解析 GBK 文件。基本上,我需要返回与模式匹配的基因的基因座标签和产品名称。因此,如果我想搜索所有预测的基因产物的主题,搜索词“预测”将返回:
/product="predicted semialdehyde dehydrogenase"
/locus_tag="ECDH10B_2481"
我已经能够返回,/product
但我无法弄清楚如何“向后”解析以获取/locus_tag
.
这是我到目前为止所拥有的:
my $fasta_file = 'example.txt';
open(INPUT, $fasta_file) || die "ERROR: can't read input FASTA file: $!";
while ( <INPUT> ) {
if(/predicted/){
print $_;
}
}
> 示例.txt
gene complement(2525423..2526436)
/gene="usg"
/locus_tag="ECDH10B_2481"
CDS complement(2525423..2526436)
/gene="usg"
/locus_tag="ECDH10B_2481"
/codon_start=1
/transl_table=11
/product="predicted semialdehyde dehydrogenase"
/protein_id="ACB03477.1"
/db_xref="GI:169889770"
/db_xref="ASAP:AEC-0002184"
/translation="MSEGWNIAVLGATGAVGEALLETLAERQFPVGEIYALARNESAG
EQL"
gene complement(2526502..2527638)
/gene="pdxB"
/locus_tag="ECDH10B_2482"
CDS complement(2526502..2527638)
/gene="pdxB"
/locus_tag="ECDH10B_2482"
/codon_start=1
/transl_table=11
/product="erythronate-4-phosphate dehydrogenase"
/protein_id="ACB03478.1"
/db_xref="GI:169889771"
/db_xref="ASAP:AEC-0002185"
/translation="MKILVDENMPYARDLFSRLGEVTAVPGRPIPVAQLADADALMVR
SVTKVNESLLAGKPIKFVGTATAGTDHVDEAWLKQAGIGFSAAP"