python - 如何从python中的numpy矩阵中检索每对可能的列对的唯一出现次数的频率

Question

我有一个使用 numpy 矩阵的矩阵：

>>> print matrix
[['L' 'G' 'T' 'G' 'A' 'P' 'V' 'I']
 ['A' 'A' 'S' 'G' 'P' 'S' 'S' 'G']
 ['A' 'A' 'S' 'G' 'P' 'S' 'S' 'G']
 ['G' 'L' 'T' 'G' 'A' 'P' 'V' 'I']]

我想要的是对于每对可能的列，从每对列中的行中检索每对字母的唯一出现次数的频率。

例如，对于第一对列，即：

[['L' 'G']
 ['A' 'A']
 ['A' 'A']
 ['G' 'L']]

我想检索列中每对字母的频率（注意：字母的顺序很重要）

['L' 'G'] 的频率 = 1/4

['A' 'A'] 的频率 = 2/4

['G' 'L'] 的频率 = 1/4

一旦计算了第一对列的这些频率，然后对每个其他可能的列对组合执行相同的操作。

我认为某种 itertools 将有助于解决这个问题，但我不知道如何......任何帮助将不胜感激

score 6 · Accepted Answer

我会使用itertools.combinations和collections.Counter：

for i, j in itertools.combinations(range(len(s.T)), 2):
    c = s[:, [i,j]]
    counts = collections.Counter(map(tuple,c))
    print 'columns {} and {}'.format(i,j)
    for k in sorted(counts):
        print 'Frequency of {} = {}/{}'.format(k, counts[k], len(c))
    print

生产

columns 0 and 1
Frequency of ('A', 'A') = 2/4
Frequency of ('G', 'L') = 1/4
Frequency of ('L', 'G') = 1/4

columns 0 and 2
Frequency of ('A', 'S') = 2/4
Frequency of ('G', 'T') = 1/4
Frequency of ('L', 'T') = 1/4

[...]

（如果您想要两个订单，修改它以同时执行 0 1 和 1 0 列是微不足道的，而且我假设每对可能的列都不是指“每对相邻的列”）。

score 0 · Accepted Answer

如果您有空余内存，对于某些大小的数组，我猜想列数少，行数多，做一个更密集的 numpy 解决方案可能会有所回报：

>>> rows, cols = matrix.shape
>>> matches = np.empty((rows, cols, cols, 2), dtype=str)
>>> matches[..., 0] = matrix[:, None, :]
>>> matches[..., 1] = matrix[:, :, None]
>>> matches = matches.view('S2')
>>> matches = matches.reshape((rows, cols, cols))

现在，matches[:, i, j]您在列i和之间有了唯一的对j，然后您可以执行以下操作：

>>> unique, idx = np.unique(matches[:, 0, 1], return_inverse=True)
>>> counts = np.bincount(idx)
>>> unique
array(['AA', 'GL', 'LG'], 
      dtype='|S2')
>>> counts
array([2, 1, 1])

python - 如何从python中的numpy矩阵中检索每对可能的列对的唯一出现次数的频率

2 回答 2

Related

Reference