0

I was wondering if anyone knows how to simplify, or generalize this code. It gives the correct answer, however it is only applicable to the current situation. My code is as follows:

sub longestRepeat{
                                # list of argument @_ is: (sequence, nucleotide)
  my $someSequence = shift(@_);  # shift off the first  argument from the list
  my $whatBP       = shift(@_);  # shift off the second argument from the list
  my $match = 0;



        if ($whatBP eq "AT"){
            if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) {

            $match = $1
            }
            return $match;

        }
        if ($whatBP eq "TAGA"){
            if ($someSequence =~ m/(([T][A][G][A])\2\2)/g) {

            $match = $1
            }
            return $match;
        }

        if ($whatBP eq "C"){
            if ($someSequence =~ m/(([C])\2\2)/g) {

            $match = $1
            }
            return $match;
        }
}   

My question is, in the second if statement, I have it set to a set amount of that pattern being repeated (applicable for the string we were given). However, is there a way to keep doing a while loop to search through the \2 (pattern repeat)? What I mean is can this: if ($someSequence =~ m/(([A][T])\2\2\2\2\2)/g) be simplified and generalized with a while loop

4

1 回答 1

0

根据您的子程序的名称,我假设您想在您的序列中找到最长的重复序列。

如果是这样,以下情况如何:

sub longest_repeat {

    my ( $sequence, $what ) = @_;

    my @matches = $sequence =~ /((?:$what)+)/g ;  # Store all matches

    my $longest;
    foreach my $match ( @matches ) {  # Could also avoid temp variable :
                                      # for my $match ( $sequence =~ /((?:$what)+)/g )

        $longest //= $match ;         # Initialize
                                      #  (could also do `$longest = $match
                                      #                    unless defined $match`)

        $longest = $match if length( $longest ) < length( $match );
    }

    return $longest;  # Note this also handles the case of no matches
}

如果您可以理解,以下版本通过 Schwartzian 变换实现了基本相同的功能:

sub longest_repeat {

    my ( $sequence, $what ) = @_;                          # Example:
                                                           # --------------------
    my ( $longest ) = map { $_->[0] }                      # 'ATAT' ...
                        sort { $b->[1] <=> $a->[1] }       # ['ATAT',4], ['AT',2]
                          map { [ $_, length($_) ] }       # ['AT',2], ['ATAT',4]
                            $sequence =~ /((?:$what)+)/g ; # ... 'AT', 'ATAT'

    return $longest ;
}

有些人可能会争辩说这是浪费,sort因为它O(n.log(n))不是,O(n)但你有多种选择。

于 2013-11-14T04:52:38.607 回答