1

I have this scipy csr_matrix:

  (0, 12114) 0.272571581001
  (0, 12001) 0.0598986479579
  (0, 11998) 0.137415042369
  (0, 11132) 0.0681428952502
  (0, 10412) 0.0681428952502
  (1, 10096) 0.0990242494495
  (1, 10085) 0.216197045661
  (1, 9105) 0.1362857905
  (1, 8925) 0.042670696769
  (1, 8660) 0.0598986479579
  (2, 6577) 0.119797295916
  (2, 6491) 0.0985172979468
  (3, 6178) 0.1362857905
  (3, 5286) 0.119797295916
  (3, 5147) 0.270246307076
  (3, 4466) 0.0540492614153
  (4, 3810) 0.0540492614153
  (4, 3773) 0.0495121247248

and I would like to find a way to create (in this case 4) dictionaries where each dictionary contains the 2 biggest values for each row..

So for example, for row 0 my dictionary would be:

dict0 = {12114: '0.27257158100111998', 11998: '0.137415042369'}

and for row 1:

dict1 = {10085: '0.216197045661', 9105: '0.1362857905'}
4

1 回答 1

1

由于csr_matrix没有sort()方法,所以先将需要的行转换为数组很方便:

a = m[i,:].toarray().flatten()

要获取已排序列的位置:

argsa = a.argsort()

最大值位于 的最后一列argsa,因此要获得两个最大值的列是:

argsa[-2:]

要获得这对column, value

argsa[-2:], a[ argsa[-2:] ]

这可以在字典中转换:

dict( zip( argsa[-2:], a[ argsa[-2:] ] ) )

您的最终功能可以是:

def get_from_m(m, i, numc=2):
    a = m[i,:].toarray().flatten()
    argsa = a.argsort()
    return dict( zip( argsa[-numc:], a[ argsa[-numc:] ] ) )
于 2013-08-07T17:23:07.120 回答