I was wondering if anyone knows how to simplify, or generalize this code. It gives the correct answer, however it is only applicable to the current situation. My code is as follows:

sub longestRepeat{
                                # list of argument @_ is: (sequence, nucleotide)
  my $someSequence = shift(@_);  # shift off the first  argument from the list
  my $whatBP       = shift(@_);  # shift off the second argument from the list
  my $match = 0;

        if ($whatBP eq "AT"){
            if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) {

            $match = $1
            return $match;

        if ($whatBP eq "TAGA"){
            if ($someSequence =~ m/(([T][A][G][A])\2\2)/g) {

            $match = $1
            return $match;

        if ($whatBP eq "C"){
            if ($someSequence =~ m/(([C])\2\2)/g) {

            $match = $1
            return $match;

My question is, in the second if statement, I have it set to a set amount of that pattern being repeated (applicable for the string we were given). However, is there a way to keep doing a while loop to search through the \2 (pattern repeat)? What I mean is can this: if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) be simplified and generalized with a while loop


1 回答 1




sub longest_repeat {

    my ( $sequence, $what ) = @_;

    my @matches = $sequence =~ /((?:$what)+)/g ;  # Store all matches

    my $longest;
    foreach my $match ( @matches ) {  # Could also avoid temp variable :
                                      # for my $match ( $sequence =~ /((?:$what)+)/g )

        $longest //= $match ;         # Initialize
                                      #  (could also do `$longest = $match
                                      #                    unless defined $match`)

        $longest = $match if length( $longest ) < length( $match );

    return $longest;  # Note this also handles the case of no matches

如果您可以理解,以下版本通过 Schwartzian 变换实现了基本相同的功能:

sub longest_repeat {

    my ( $sequence, $what ) = @_;                          # Example:
                                                           # --------------------
    my ( $longest ) = map { $_->[0] }                      # 'ATAT' ...
                        sort { $b->[1] <=> $a->[1] }       # ['ATAT',4], ['AT',2]
                          map { [ $_, length($_) ] }       # ['AT',2], ['ATAT',4]
                            $sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'

    return $longest ;


于 2013-11-14T04:52:38.607 回答