python - 在 biopython 中使用 pairwise2 匹配的残基索引

Question

我有兴趣了解与pairwise2python 中使用的字符串匹配的残基索引。

例如我有两个字符串

A:' EEEEE      HHH     HHH             EEEEE'

和

B: 'EEE       EEEE       HHH'

使用以下代码：

from Bio import pairwise2
from Bio.pairwise2 import format_alignment

alignment = pairwise2.align.localdc(A,B, matrix,gap_function_1,gap_function_2)

我得到的对齐方式之一是：

EEE-------EE---      HHH     HHH             EEEEE
|||       ||   |||||||||
EEE       EEEE       HHH--------------------------
  Score=29.6

我想获得匹配的索引，即所有的原始位置Es，Hs以及' '与 seq B 匹配的 seq A。

我怎么做？

score 0 · Accepted Answer

我假设第一个空格A是错字？否则对齐方式看起来会有所不同。

所以，假设：

A = 'EEEEE      HHH     HHH             EEEEE'
B = 'EEE       EEEE       HHH'

alignment = """EEE-------EE---      HHH     HHH             EEEEE
|||       ||   |||||||||
EEE       EEEE       HHH--------------------------
  Score=29.6"""

我们可以写一个函数compare()：

def compare(align, matches, original):
    result = []
    index = -1
    for char, match in zip(align, matches):
        if char == '-':
            index += 0
        else:
            index += 1
        if match == '|':
            assert original[index] == char
            result.append(index)
    return result

接着

align_A, matches, align_B, score = alignment.splitlines()
print(compare(align_A, matches, A))

给[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13]. 快速的目视检查证实了这一点：A匹配的前 14 个字符（5E秒、6 个空格和 3H秒）。和

print(compare(align_B, matches, B))

给[0, 1, 2, 10, 11, 15, 16, 17, 18, 19, 20, 21, 22, 23].

python - 在 biopython 中使用 pairwise2 匹配的残基索引

1 回答 1

Related

Reference