string - 最长公共回文子序列

Question

是否有任何有效的算法来计算两个给定字符串的最长公共回文子序列的长度？

例如：

字符串 1。 afbcdfca

字符串 2。 bcadfcgyfka

LCPS 为 5，LCPS 字符串为afcfa.

score 5 · Accepted Answer

是的。

只需对两个以上序列使用 LCS 算法即可。

如果我没记错的话

 LCS( afbcdfca, acfdcbfa, bcadfcgyfka, akfygcfdacb) = afcfa

由您决定字符串#2 和#4。

更新：不，这是一个反例：LCS(aba, aba, bab, bab) = ab, ba。LCS 不能确保子序列是回文，您可能需要添加此约束。无论如何，LCS 的递归方程是一个很好的起点。

如果您以生成器样式实现 LCS，以便它生成长度为 n、n-1、n-2 等的所有 LCS，那么您应该能够相当有效地计算 LCS-gen(string1,反向字符串1），LCS-gen（字符串2，反向字符串2）。但是我还没有检查是否有高效的 LCS-gen。

score 0 · Accepted Answer

和这个问题一样： http ://code.google.com/codejam/contest/1781488/dashboard#s=p2

http://code.google.com/codejam/contest/1781488/dashboard#s=a&a=2

在下面的代码中，我为您提供了一个 cd(s) 方法（它返回一个 char dict，它告诉您字符串中的下一个或前一个 char 与该 char 相等的位置）。使用它来实现一个动态编程解决方案，我还为您提供了示例代码。

使用动态编程，只有 len(s1)*len(s1)/2 个状态，因此 order(N^2) 是可能的。

这个想法是要么从 s1 获取一个字符，要么不获取它。如果从 s1 上取下 char，则必须从（s1 和 s2 的）前面和后面取下它。如果你不接受它，你继续前进 1.（这意味着 s2 的状态与 s1 的状态保持同步，因为你总是从两者的外部贪婪地获取——所以你只担心 s1 可以有多少状态）。

这段代码让你大部分时间都在那里。cd1 (char dict 1) 帮助您找到 s1 中的下一个字符等于您所在的字符（向前和向后）。

在递归 solve() 方法中，您需要决定 start1、end1.. 等应该是什么。每次选择一个角色时，总数加 2（除非 start1 == end1 - 然后加 1）

s1 = "afbcdfca"
s2 = "bcadfcgyfka"

def cd(s):
    """returns dictionary d where d[i] = j where j is the next occurrence of character i"""
    char_dict = {}
    last_pos = {}
    for i, char in enumerate(s):
        if char in char_dict:
            _, forward, backward = char_dict[char]
            pos = last_pos[char]
            forward[pos] = i
            backward[i] = pos
            last_pos[char] = i
        else:
            first, forward, backward = i, {}, {}
            char_dict[char] = (first, forward, backward)
            last_pos[char] = i
    return char_dict

print cd(s1)
"""{'a': ({0: 7}, {7: 0}), 'c': ({3: 6}, {6: 3}), 'b': ({}, {}), 'd': ({}, {}), 'f': ({1: 5}, {5: 1})}"""

cd1, cd2 = cd(s1), cd(s2)

cache = {}
def solve(start1, end1, start2, end2):
    state = (start1, end1)
    answer = cache.get(state, None)
    if answer:
        return answer
    
    if start1 < end1:
        return 0
    c1s, c1e = s1[start1], s1[end1]
    c2s, c2e = s2[start2], s2[end2]
    
    #if any of c1s, c1e, c2s, c2e are equal and you don't take, you must
    #skip over those characters too:
    dont_take_end1 = solve(start1, end1 - 1, start2, end2)
    
    do_take_end1 = 2
    if do_take_end1:
        end1_char = s1[end1]
        #start1 = next character along from start1 that equals end1_char
        #end1 = next char before end1 that equals end1_char
        #end2 = next char before end2 that..
        #start2 = next char after .. that ..
        do_take_end1 += solve(start1, end1, start2, end2)
    
    
    answer = cache[state] = max(dont_take_end1, do_take_end1)
    return answer
    
print solve(0, len(s1), 0, len(s2))

score 0 · Accepted Answer

这是我逐行进行的万无一失的演练，因为这很常见，而且大多数时候人们解释了动态编程部分 70% 并停留在血淋淋的细节上。

最优子结构：令X[0..n-1]为长度为 n 的输入序列，L(0, n-1)为最长回文子序列的长度X[0..n-1].

如果 X 的最后一个字符和第一个字符相同，则L(0, n-1) = L(1, n-2) + 2. 等等为什么，如果第二个和倒数第二个字符不一样怎么办，最后一个和第一个 bing 不一样是没用的。不，这个“子序列”不必是连续的。

/* Driver program to test above functions */
int main()
{
    char seq[] = "panamamanap";
    int n = strlen(seq);
    printf ("The lnegth of the LPS is %d", lps(seq, 0, n-1));
    getchar();
    return 0;
}

int lps(char *seq, int i, int j)
{   
    // Base Case 1: If there is only 1 character   
    if (i == j)     
        return 1;    
    
    // Base Case 2: If there are only 2 characters and both are same   
    if (seq[i] == seq[j] && i + 1 == j)     
        return 2;    
    
    // If the first and last characters match   
    if (seq[i] == seq[j])      
        return lps (seq, i+1, j-1) + 2;    
    
    // If the first and last characters do not match   
    else return max( lps(seq, i, j-1), lps(seq, i+1, j) );
}

考虑到上面的实现，下面是一个长度为 6 的序列的部分递归树，其中包含所有不同的字符。

               L(0, 5)
             /        \ 
            /          \  
        L(1,5)          L(0,4)
       /    \            /    \
      /      \          /      \
  L(2,5)    L(1,4)  L(1,4)  L(0,3)

在上面的部分递归树中，L(1, 4)被求解了两次。如果我们绘制完整的递归树，那么我们可以看到有很多子问题被一次又一次地解决。L[][]与其他典型的动态规划（DP）问题一样，通过以自下而上的方式构造临时数组，可以避免重复计算相同的子问题。

// Returns the length of the longest palindromic subsequence in seq
int lps(char *str)
{
   int n = strlen(str);
   int i, j, cl;
   int L[n][n];  // Create a table to store results of subproblems
 
 
   // Strings of length 1 are palindrome of length 1
   for (i = 0; i < n; i++)
      L[i][i] = 1;
 
    for (cl=2; cl<=n; cl++)                              //again this is the length of chain we are considering
    {
        for (i=0; i<n-cl+1; i++)                         //start at i
        {
            j = i+cl-1;                                  //end at j
            if (str[i] == str[j] && cl == 2)             //if only 2 characters and they are the same then set L[i][j] = 2
               L[i][j] = 2;
            else if (str[i] == str[j])                   //if greater than length 2 and first and last characters match, add 2 to the calculated value of the center stripped of both ends
               L[i][j] = L[i+1][j-1] + 2;
            else
               L[i][j] = max(L[i][j-1], L[i+1][j]);      //if not match, then take max of 2 possibilities
        }
    }
 
    return L[0][n-1];
}

所以这就像逻辑上的非动态编程，只是在这里我们将结果保存在一个数组中，所以我们不会一遍又一遍地计算相同的东西

string - 最长公共回文子序列

3 回答 3

Related

Reference