python - Python：通过 scipy 最小二乘法优化 1、-1 和 0 集的内存高效矩阵创建

Question

我正在遍历字符串列表并将它们转换为 1、-1 和 0 的数组。例如 - 我可能有以下列表：

A,B,-C
A,-D
B,C,-D

这将成为一个“大名单”，等于：

[
 [1  1 -1  0],
 [1  0  0 -1],
 [0  1  1 -1]
]

目前，我只是遍历字符串的每一行，如果字符串是唯一的，则为其分配 1 或 -1 的值，并将不存在的字符串清零（例如，第一个中不存在 D行，所以它是 0)。我做上述的愚蠢方式基本上是：

for line_of_strings in all_strings:
    for the_string in line_of_strings:
        entry[string_index] = (1 or -1)

    biglist.append(entry)

最终，我有一组很好的列表可供我运行：

scipy.optimize.nnls(biglist)

这可行，但最终会占用大量内存和时间。有没有更有效的方法来解决这个问题？也许使用 numpy 或 scipy 数组/矩阵？

score 1 · Accepted Answer

使用 numpy 数组而不是列表似乎在时间上产生了很大的差异，至少在一个简单的例子中：

$ python -mtimeit -s"from scipy.optimize import nnls; m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]; b=[1, 2, 3]" "nnls(m, b)"
10000 loops, best of 3: 38.5 usec per loop

$ python -mtimeit -s"import numpy as np; from scipy.optimize import nnls; m = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]); b=[1, 2, 3]" "nnls(m, b)"
100000 loops, best of 3: 20 usec per loop

$ python -mtimeit -s"import numpy as np; from scipy.optimize import nnls; m = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]); b=np.array([1, 2, 3])" "nnls(m, b)"
100000 loops, best of 3: 11.4 usec per loop

我希望 numpy 数组的内存占用也更小。如果您的输入相当稀疏，并且性能仍然不令人满意，则可能值得调查是否nnls接受稀疏矩阵。

python - Python：通过 scipy 最小二乘法优化 1、-1 和 0 集的内存高效矩阵创建

1 回答 1

Related

Reference