3

I getting unpleasant behavior when I set values in .data of csr_matrix to zero. Here is an example:

from scipy import sparse
a = sparse.csr_matrix([[0,0,2,0], [1,1,0,0],[0,3,0,0]])

Output:

>>> a.A
array([[0, 0, 2, 0],
       [1, 1, 0, 0],
       [0, 3, 0, 0]])
>>> a.data
array([2, 1, 1, 3])
>>> a.data[3] = 0   # setting one element to zero
>>> a.A
array([[0, 0, 2, 0],
       [1, 1, 0, 0],
       [0, 0, 0, 0]])
>>> a.data
array([2, 1, 1, 0]) # however, this zero is still considered part of data
                    # what I would like to see is:
                    # array([2, 1, 1])

>>> a.nnz           # also `nnz` tells me that there 4 non-zero elements 
                    # which is incorrect, I would like 3 as an output
4

>>> a.nonzero()     # nonzero method does follow the behavior I expected
(array([0, 1, 1], dtype=int32), array([2, 0, 1], dtype=int32))

What is the best practice in the above situation? Should setting elements of .data to zero be avoided? Is .nnz unreliable way find number of zeros?

4

1 回答 1

2

scipy 中的稀疏矩阵(至少 CSC 和 CSR)有一种.eliminate_zeros()方法来处理这种情况。跑

a.eliminate_zeros()

每次你弄乱a.data,它应该照顾它。

于 2013-10-01T17:28:31.403 回答