12

以下是我所知道的在马尔可夫链中计算转换并使用它来填充转换矩阵的最基本方法:

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

我试过用 3 种不同的方式加速它:

1)使用基于此Matlab代码的稀疏矩阵单线:

transition_matrix = full(sparse(markov_chain(1:end-1), markov_chain(2:end), 1))

在 Numpy/SciPy 中,如下所示:

def get_sparse_counts_matrix(markov_chain, number_of_states):
    return coo_matrix(([1]*(len(markov_chain) - 1), (markov_chain[0:-1], markov_chain[1:])), shape=(number_of_states, number_of_states)) 

而且我尝试了更多的 Python 调整,比如使用 zip():

for old_state, new_state in zip(markov_chain[0:-1], markov_chain[1:]):
    transition_counts_matrix[old_state, new_state] += 1 

和队列:

old_and_new_states_holder = Queue(maxsize=2)
old_and_new_states_holder.put(markov_chain[0])
for new_state in markov_chain[1:]:
    old_and_new_states_holder.put(new_state)
    old_state = old_and_new_states_holder.get()
    transition_counts_matrix[old_state, new_state] += 1

但是这三种方法都没有加快速度。事实上,除了 zip() 解决方案之外的所有解决方案都至少比我原来的解决方案慢 10 倍。

还有其他值得研究的解决方案吗?



从大量链构建转换矩阵的修改解决方案
上述问题的最佳答案具体是 DSM。但是,对于任何想要根据数百万个马尔可夫链的列表填充转换矩阵的人来说,最快的方法是:

def fast_increment_transition_counts_from_chain(markov_chain, transition_counts_matrix):
    flat_coords = numpy.ravel_multi_index((markov_chain[:-1], markov_chain[1:]), transition_counts_matrix.shape)
    transition_counts_matrix.flat += numpy.bincount(flat_coords, minlength=transition_counts_matrix.size)

def get_fake_transitions(markov_chains):
    fake_transitions = []
    for i in xrange(1,len(markov_chains)):
        old_chain = markov_chains[i - 1]
        new_chain = markov_chains[i]
        end_of_old = old_chain[-1]
        beginning_of_new = new_chain[0]
        fake_transitions.append((end_of_old, beginning_of_new))
    return fake_transitions

def decrement_fake_transitions(fake_transitions, counts_matrix):
    for old_state, new_state in fake_transitions:
        counts_matrix[old_state, new_state] -= 1

def fast_get_transition_counts_matrix(markov_chains, number_of_states):
    """50% faster than original, but must store 2 additional slice copies of all markov chains in memory at once.
    You might need to break up the chains into manageable chunks that don't exceed your memory.
    """
    transition_counts_matrix = numpy.zeros([number_of_states, number_of_states])
    fake_transitions = get_fake_transitions(markov_chains)
    markov_chains = list(itertools.chain(*markov_chains))
    fast_increment_transition_counts_from_chain(markov_chains, transition_counts_matrix)
    decrement_fake_transitions(fake_transitions, transition_counts_matrix)
    return transition_counts_matrix
4

4 回答 4

8

只是为了好玩,并且因为我一直想尝试一下,所以我将Numba应用于您的问题。在代码中,这仅涉及添加一个装饰器(尽管我已经进行了直接调用,所以我可以测试 numba 在这里提供的 jit 变体):

import numpy as np
import numba

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

autojit_func = numba.autojit()(increment_counts_in_matrix_from_chain)
jit_func = numba.jit(argtypes=[numba.int64[:,::1],numba.double[:,::1]])(increment_counts_in_matrix_from_chain)

t = np.random.randint(0,50, 500)
m1 = np.zeros((50,50))
m2 = np.zeros((50,50))
m3 = np.zeros((50,50))

然后是时间:

In [10]: %timeit increment_counts_in_matrix_from_chain(t,m1)
100 loops, best of 3: 2.38 ms per loop

In [11]: %timeit autojit_func(t,m2)                         

10000 loops, best of 3: 67.5 us per loop

In [12]: %timeit jit_func(t,m3)
100000 loops, best of 3: 4.93 us per loop

autojit方法根据运行时输入进行一些猜测,并且该jit函数具有指定的类型。您必须小心一点,因为在这些早期阶段的 numba 不会传达jit如果您为输入传递错误的类型存在错误。它只会吐出一个不正确的答案。

尽管如此,在没有任何代码更改的情况下获得 35 倍和 485 倍的加速,并且只需添加对 numba 的调用(也可以称为装饰器)在我的书中是相当令人印象深刻的。使用 cython 可能会得到类似的结果,但它需要更多样板文件并编写 setup.py 文件。

我也喜欢这个解决方案,因为代码仍然可读,并且您可以按照您最初考虑实现算法的方式编写它。

于 2012-11-04T18:41:51.093 回答
6

这样的事情怎么样,利用np.bincount?不是超级强大,但功能强大。[感谢@Warren Weckesser 的设置。]

import numpy as np
from collections import Counter

def increment_counts_in_matrix_from_chain(markov_chain, transition_counts_matrix):
    for i in xrange(1, len(markov_chain)):
        old_state = markov_chain[i - 1]
        new_state = markov_chain[i]
        transition_counts_matrix[old_state, new_state] += 1

def using_counter(chain, counts_matrix):
    counts = Counter(zip(chain[:-1], chain[1:]))
    from_, to = zip(*counts.keys())
    counts_matrix[from_, to] = counts.values()

def using_bincount(chain, counts_matrix):
    flat_coords = np.ravel_multi_index((chain[:-1], chain[1:]), counts_matrix.shape)
    counts_matrix.flat = np.bincount(flat_coords, minlength=counts_matrix.size)

def using_bincount_reshape(chain, counts_matrix):
    flat_coords = np.ravel_multi_index((chain[:-1], chain[1:]), counts_matrix.shape)
    return np.bincount(flat_coords, minlength=counts_matrix.size).reshape(counts_matrix.shape)

这使:

In [373]: t = np.random.randint(0,50, 500)
In [374]: m1 = np.zeros((50,50))
In [375]: m2 = m1.copy()
In [376]: m3 = m1.copy()

In [377]: timeit increment_counts_in_matrix_from_chain(t, m1)
100 loops, best of 3: 2.79 ms per loop

In [378]: timeit using_counter(t, m2)
1000 loops, best of 3: 924 us per loop

In [379]: timeit using_bincount(t, m3)
10000 loops, best of 3: 57.1 us per loop

[编辑]

避免flat(以不在原地工作为代价)可以为小型矩阵节省一些时间:

In [80]: timeit using_bincount_reshape(t, m3)
10000 loops, best of 3: 22.3 us per loop
于 2012-11-04T15:55:24.973 回答
0

这是一个更快的方法。这个想法是计算每个转换的出现次数,并在矩阵的矢量化更新中使用这些计数。(我假设相同的转换可以在 中多次发生markov_chain。)库中的Countercollections用于计算每个转换的出现次数。

from collections import Counter

def update_matrix(chain, counts_matrix):
    counts = Counter(zip(chain[:-1], chain[1:]))
    from_, to = zip(*counts.keys())
    counts_matrix[from_, to] += counts.values()

时序示例,在 ipython 中:

In [64]: t = np.random.randint(0,50, 500)

In [65]: m1 = zeros((50,50))

In [66]: m2 = zeros((50,50))

In [67]: %timeit increment_counts_in_matrix_from_chain(t, m1)
1000 loops, best of 3: 895 us per loop

In [68]: %timeit update_matrix(t, m2)
1000 loops, best of 3: 504 us per loop

它更快,但不是更快的数量级。为了真正加快速度,您可以考虑在 Cython 中实现它。

于 2012-11-04T14:51:06.677 回答
0

好的,很少有想法可以篡改,略有改进(以人类不理解为代价)

让我们从长度为 3000 的 0 到 9 之间的整数的随机向量开始:

L = 3000
N = 10
states = array(randint(N),size=L)
transitions = np.zeros((N,N))

在我的机器上,您的方法的timeit性能为11.4 ms

一点改进的第一件事是避免两次读取数据,将其存储在临时变量中:

old = states[0]
for i in range(1,len(states)):
    new = states[i]
    transitions[new,old]+=1
    old=new

这会给您带来约 10% 的改进,并将时间降至10.9 ms

一种更内卷的方法使用步幅:

def rolling(a, window):
    shape = (a.size - window + 1, window)
    strides = (a.itemsize, a.itemsize)
    return np.lib.stride_tricks.as_strided(a, shape=shape, strides=strides)

state_2 = rolling(states, 2)
for i in range(len(state_2)):
    l,m = state_2[i,0],state_2[i,1]
    transitions[m,l]+=1

strides 允许您读取数组的连续数字,从而欺骗数组以认为行以不同的方式开始(好的,它没有很好地描述,但是如果您花一些时间阅读有关 strides 的信息,您会明白的)这种方法性能下降,达到12.2 ms,但这是进一步欺骗系统的走廊。将转换矩阵和跨步数组展平为一维数组,可以进一步提高性能:

transitions = np.zeros(N*N)
state_2 = rolling(states, 2)
state_flat = np.sum(state_2 * array([1,10]),axis=1)
for i in state_flat:
    transitions[i]+=1
transitions.reshape((N,N))

这下降到7.75 ms。这不是一个数量级,但无论如何都要好 30% :)

于 2012-11-04T14:59:11.047 回答