4

在处理一些文本数据时,我试图将一个 np 数组(来自熊猫系列)加入到一个 csr 矩阵中。

我已经完成了以下操作。

#create a compatible sparse matrix from my np.array.
#sparse.csr_matrix(X['link'].values) returns array size (1,7395)
#transpose that array for (7395,1)

X = sparse.csr_matrix(X['link'].values.transpose)


#bodies is a sparse.csr_matrix with shape (7395, 20000)

bodies = sparse.hstack((bodies,X))  

但是,这一行给出了错误no supported conversion for types: (dtype('O'),)。我不确定这意味着什么?我该如何解决?

谢谢。

4

2 回答 2

3

这是 Saullo Castro 的评论作为答案:

x = np.arange(12).reshape(1,12)  # ndarray
sparse.csr_matrix(x)
Out[14]: <1x12 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>

x.transpose   # function, not ndarray
Out[15]: <function transpose>  

X = sparse.csr_matrix(x.transpose)
TypeError: no supported conversion for types: (dtype('O'),)

错误发生在使用之前hstack,尝试从函数而不是 ndarray 生成稀疏矩阵。错误是省略().

# x.transpose() == x.T   # ndarray

sparse.csr_matrix(x.transpose())
Out[17]: <12x1 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>

sparse.csr_matrix(x.T)
Out[18]: <12x1 sparse matrix of type '<type 'numpy.int32'>'
with 11 stored elements in Compressed Sparse Row format>


bodies = sparse.rand(12,3,format='csr',density=.1)
sparse.hstack((bodies,X))
Out[32]: <12x4 sparse matrix of type '<type 'numpy.float64'>'
with 14 stored elements in COOrdinate format>

csr_matrix如果给定转置数组,则可以正常工作。

于 2013-09-16T06:31:45.700 回答
0
import numpy as np
import pandas as pd
from scipy import sparse

d = {
    "a": 30,
    "b": 20,
    "c": 10
}

s = pd.Series(d, index=["c", "b", "a"])
print s

--output:--
c    10
b    20
a    30
dtype: int64

my_ndarray = s.values
print my_ndarray

--output:--
[10 20 30]

X = sparse.csr_matrix(my_ndarray).transpose()
print X.todense()

--output:--
[[10]
 [20]
 [30]]


bodies = sparse.csr_matrix([
    [0, 1],
    [1, 0],
    [0, 0]
])
print bodies.todense()

--output:--
[[0 1]
 [1 0]
 [0 0]]

result = sparse.hstack((bodies,X))  
print result.todense()

--output:--
[[ 0  1 10]
 [ 1  0 20]
 [ 0  0 30]]

并写:

X = sparse.csr_matrix(my_ndarray.transpose())

产生错误:

Traceback (most recent call last):
  File "1.py", line 33, in <module>
    result = sparse.hstack((bodies,X))  
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/construct.py", line 417, in hstack
    return bmat([blocks], format=format, dtype=dtype)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/construct.py", line 515, in bmat
    raise ValueError('blocks[%d,:] has incompatible row dimensions' % i)
ValueError: blocks[0,:] has incompatible row dimensions

相比于:

import numpy as np
import pandas as pd
from scipy import sparse

d = {
    "a": "hello",
    "b": "world",
    "c": "goodbye"
}

s = pd.Series(d, index=["c", "b", "a"])
print s

--output:--
c    goodbye
b      world
a      hello

my_ndarray = s.values
print my_ndarray

--output:--
[goodbye world hello]

X = sparse.csr_matrix(s.values).transpose()

--output:--
Traceback (most recent call last):
  File "1.py", line 19, in <module>
    X = sparse.csr_matrix(s.values).transpose()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 66, in __init__
    self._set_self( self.__class__(coo_matrix(arg1, dtype=dtype)) )
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/compressed.py", line 30, in __init__
    arg1 = arg1.asformat(self.format)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/base.py", line 203, in asformat
    return getattr(self,'to' + format)()
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/coo.py", line 312, in tocsr
    data    = np.empty(self.nnz, dtype=upcast(self.dtype))
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/scipy/sparse/sputils.py", line 53, in upcast
    raise TypeError('no supported conversion for types: %r' % (args,))
TypeError: no supported conversion for types: (dtype('object'),)

您自己没有提供这样的示例这一事实意味着您没有在调试问题上投入足够的工作。

于 2013-09-14T21:36:28.947 回答