1

我想找到 ATG...TAG 或 ATG...TAA 的所有事件。我尝试了以下方法:

#!/usr/bin/perl
use warnings;
use strict; 

my $file = ('ATGCCCCCCCCCCCCCTAGATGAAAAAAAAAATAAATGAAAAATAGATGCCCCCCCCCCCCCCC');

while($file =~ /((?=(ATG\w+?TAG|ATG\w+?TAA))/g){ 
    print "$1\n";           
} 

这使-

ATGCCCCCCCCCCCCCTAG
ATGAAAAAAAAAATAAATGAAAAATAG
ATGAAAAATAG

我想 -

ATGCCCCCCCCCCCCCTAG
ATGAAAAAAAAAATAA
ATGAAAAATAG

我做错了什么?

4

3 回答 3

1

/(ATG\w+?TA[AG])/工作并且比FlyingFrog提议的更优雅;-)

-bash-3.2$ perl
my $string = 'ATGCCCCCCCCCCCCCTAGATGAAAAAAAAAATAAATGAAAAATAGATGCCCCCCCCCCCCCCC';
my @matches = $string =~ /(ATG\w+?TA[AG])/g;
use Data::Dumper;
print Dumper \@matches;
$VAR1 = [
          'ATGCCCCCCCCCCCCCTAG',
          'ATGAAAAAAAAAATAA',
          'ATGAAAAATAG'
        ];
于 2013-09-03T13:49:42.077 回答
1

您实际上非常接近,从您上面的陈述中可以看出您有两个捕获,而我认为您真的只想要一个;我也不认为你需要前瞻。

#!/usr/bin/perl
use warnings;
use strict;

my $file = ('ATGCCCCCCCCCCCCCTAGATGAAAAAAAAAATAAATGAAAAATAGATGCCCCCCCCCCCCCCC');

while($file =~ /(ATG\w+?TA[AG])/g){
    print "$1\n";
}

# output
# ATGCCCCCCCCCCCCCTAG
# ATGAAAAAAAAAATAA
# ATGAAAAATAG

逐行:

ATG 匹配文字ATG

\w+? 可选地匹配一个或多个字符

TA[AG] 匹配文字TAATAG

于 2013-09-03T13:50:49.323 回答
0

您的代码将找到以or开头ATG和结尾的序列- 以先到者为准。如果您从序列中删除所有 s,您会发现以. 通过制作两个捕获组(一个 for和一个 for ),您将找到所有序列。TAGTAATAGTAAATG...TAGATG...TAA

#!/usr/bin/perl
use warnings;
use strict; 

my $file = ('ATGCCCCCCCCCCCCCTAGATGAAAAAAAAAATAAATGAAAAATAGATGCCCCCCCCCCCCCCC');

while($file =~ /(?=(ATG\w+?TAG))(?=(ATG\w+?TAA))/g){ # makes two capture groups 
    print "$1\n";
    print "$2\n";           
} 

输出:

ATGCCCCCCCCCCCCCTAG
ATGCCCCCCCCCCCCCTAGATGAAAAAAAAAATAA
ATGAAAAAAAAAATAAATGAAAAATAG
ATGAAAAAAAAAATAA

- - 或者: - -

#!/usr/bin/perl
use warnings;
use strict; 

my $file = ('ATGCCCCCCCCCCCCCTAGATGAAAAAAAAAATAAATGAAAAATAGATGCCCCCCCCCCCCCCC');

while($file =~ /(?=(ATG\w+?TA[AG]))/g){ 
    print "$1\n";
} 

输出:

ATGCCCCCCCCCCCCCTAG
ATGAAAAAAAAAATAA
ATGAAAAATAG

具体看你追求什么...

于 2013-09-03T13:42:59.747 回答