我有以下句子:
zzzzzzz microRNA146a xxx (miR-146a, mir-33c) xxxx wwwwww Breast Cancer zzzz mir-33c kkk
我想要做的是根据一些预定义的正则表达式规则标记该句子中的单词/短语。最后它看起来像这样:
zzzzzzz [microRNA146a]<MIR-0> xxx ([miR-146a]<MIR-1>, [mir-33c]<MIR-2>) xxxx wwwwww [Breast Cancer] <CANCER-0> zzzz [mir-33c]<MIR-2> kkk.
请注意,在上面的输出中,满足规则的每个单词/短语都按其出现的顺序进行索引。
我坚持使用以下代码。正确的方法是什么?
#!/usr/bin/perl -w
use strict;
use Data::Dumper;
my $text = 'zzzzzzz microRNA146a xxx (miR-146a, mir-33c) xxxx wwwwww Breast Cancer zzzz';
# Rule 1 for miRNA definition
my @mirlist = ($text =~ /( mir-\d+\w+| microRNA\d+)/xgi);
# Rule 2 for special words/phrases
my @spec = ($text =~ /(Breast Cancer)/gi);
# These arrays already preserve the order of occurrence
print Dumper \@mirlist ;
print Dumper \@spec ;
# Not sure how to proceed from here
*更新: *添加重复出现的 miRNA 并细化所需的答案。