I getting unpleasant behavior when I set values in .data
of csr_matrix
to zero. Here is an example:
from scipy import sparse
a = sparse.csr_matrix([[0,0,2,0], [1,1,0,0],[0,3,0,0]])
Output:
>>> a.A
array([[0, 0, 2, 0],
[1, 1, 0, 0],
[0, 3, 0, 0]])
>>> a.data
array([2, 1, 1, 3])
>>> a.data[3] = 0 # setting one element to zero
>>> a.A
array([[0, 0, 2, 0],
[1, 1, 0, 0],
[0, 0, 0, 0]])
>>> a.data
array([2, 1, 1, 0]) # however, this zero is still considered part of data
# what I would like to see is:
# array([2, 1, 1])
>>> a.nnz # also `nnz` tells me that there 4 non-zero elements
# which is incorrect, I would like 3 as an output
4
>>> a.nonzero() # nonzero method does follow the behavior I expected
(array([0, 1, 1], dtype=int32), array([2, 0, 1], dtype=int32))
What is the best practice in the above situation? Should setting elements of .data
to zero be avoided? Is .nnz
unreliable way find number of zeros?