3

I would like to find a way to manipulate a scipy.sparse.csr_matrix in order to obtain the sum of its elements based on the column..For example, if I have this:

  (2, 883)  0.0194935608679
  (10, 883) 0.193169152693
  (11, 883) 0.1099280996
  (18, 883) 0.231353403277
  (11, 884) 0.151292618076
  (12, 885) 0.0897609047606
  (15, 885) 0.105370721749
  (10, 886) 0.116845834609
  (18, 886) 0.069971527852
  (0, 947)  0.111838970767
  (1, 947)  0.0694444065422
  (2, 947)  0.0440324424809
  (4, 947)  0.0233598916271
  (5, 947)  0.301621257244
  (6, 947)  0.0546866477512
  (7, 947)  0.162040885384
  (9, 947)  0.0786245669428
  (10, 947) 0.130900295682
  (11, 947) 0.0496615549666
  (12, 947) 0.100557533892
  (13, 947) 0.114494053085
  (14, 947) 0.0535641315858
  (15, 947) 0.0393483107586
  (16, 947) 0.0207896459813
  (17, 947) 0.0538302241537
  : :

the sum of the column 883 would be 0.5539442164

4

1 回答 1

3

You can just do:

mymatrix[:,883].sum()

It is noteworthy to say that if you are planning to do column-wise operations the csc_matrix type is much faster. For example:

r = np.random.random((1000,1000))

a = csr_matrix(r)
b = csc_matrix(r)

In [20]: timeit a[:,88].sum()
1000 loops, best of 3: 1.88 ms per loop

In [21]: timeit b[:,88].sum()
10000 loops, best of 3: 129 us per loop

For row-wise operations you should stick to the csr_matrix type.

于 2013-08-07T13:40:25.270 回答