5

我有以下我放在一起的字符串:

v1fColor = '2,4,14,5,0,0,0,0,0,0,0,0,0,0,12,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20,9,0,0,0,2,2,0,0,0,0,0,0,0,0,0,13,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,10,8,0,0,0,1,2,0,0,0,0,0,0,0,0,0,17,17,0,0,0,3,6,0,0,0,0,0,0,0,0,0,7,5,0,0,0,2,0,0,0,0,0,0,0,0,0,0,4,3,0,0,0,1,1,0,0,0,0,0,0,0,0,0,6,6,0,0,0,2,3'

我将其视为矢量:长话短说,它是图像直方图的前色:

我有以下 lambda 函数来计算两个图像的余弦相似度,所以我尝试将其转换为 numpy.array 但我失败了:

这是我的 lambda 函数

import numpy as NP
import numpy.linalg as LA
cx = lambda a, b : round(NP.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)

所以我尝试了以下方法将此字符串转换为 numpy 数组:

v1fColor = NP.array([float(v1fColor)], dtype=NP.uint8)

但我最终得到以下错误:

    v1fColor = NP.array([float(v1fColor)], dtype=NP.uint8)
ValueError: invalid literal for float(): 2,4,14,5,0,0,0,0,0,0,0,0,0,0,12,4,0,0,0,0,0,0,0,0,0,0,0,0,0,0,15,6,0,0,0,0,1,0,0,0,0,0,0,0,0,0,20,9,0,0,0,2,2,0,0,0,0,0,0,0,0,0,13,6,0,0,0,1,0,0,0,0,0,0,0,0,0,0,10,8,0,0,0,1,2,0,0,0,0,0,0,0,0,0,17,17,
4

4 回答 4

10

您必须先用逗号分割字符串:

NP.array(v1fColor.split(","), dtype=NP.uint8)
于 2012-07-31T19:03:40.530 回答
6

您可以在不使用 python 字符串方法的情况下执行此操作 - 尝试numpy.fromstring

>>> numpy.fromstring(v1fColor, dtype='uint8', sep=',')
array([ 2,  4, 14,  5,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 12,  4,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 15,  6,  0,  0,
        0,  0,  1,  0,  0,  0,  0,  0,  0,  0,  0,  0, 20,  9,  0,  0,  0,
        2,  2,  0,  0,  0,  0,  0,  0,  0,  0,  0, 13,  6,  0,  0,  0,  1,
        0,  0,  0,  0,  0,  0,  0,  0,  0,  0, 10,  8,  0,  0,  0,  1,  2,
        0,  0,  0,  0,  0,  0,  0,  0,  0, 17, 17,  0,  0,  0,  3,  6,  0,
        0,  0,  0,  0,  0,  0,  0,  0,  7,  5,  0,  0,  0,  2,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  0,  4,  3,  0,  0,  0,  1,  1,  0,  0,  0,
        0,  0,  0,  0,  0,  0,  6,  6,  0,  0,  0,  2,  3], dtype=uint8)
于 2012-07-31T19:37:52.580 回答
6

你可以这样做:

lst = v1fColor.split(',')  #create a list of strings, splitting on the commas.
v1fColor = NP.array( lst, dtype=NP.uint8 ) #numpy converts the strings.  Nifty!

或更简洁地说:

v1fColor = NP.array( v1fColor.split(','), dtype=NP.uint8 )

请注意,这样做更习惯一些:

import numpy as np

相比import numpy as NP

编辑

就在今天,我了解了numpy.fromstring也可以用来解决这个问题的函数:

NP.fromstring( "1,2,3" , sep="," , dtype=NP.uint8 )
于 2012-07-31T19:03:57.703 回答
0

我正在写这个答案,所以如果将来有任何参考资料:我不确定在这种情况下什么是正确的解决方案,但我认为 @David Robinson 最初发布的内容是正确的答案,原因有一个:余弦相似度值不能大于一个,当我使用NP.array(v1fColor.split(","), dtype=NP.uint8)选项时,我得到两个向量之间余弦相似度高于 1.0 的 strage 值。

所以我写了一个简单的示例代码来试试:

import numpy as np
import numpy.linalg as LA

def testFunction():
    value1 = '2,3,0,80,125,15,5,0,0,0,0,0,0,0,0,0,0,0,0,0,2,4,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,4,3,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'
    value2 = '2,137,0,4,96,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0'
    cx = lambda a, b : round(np.inner(a, b)/(LA.norm(a)*LA.norm(b)), 3)
    #v1fColor = np.array(map(int,value1.split(',')))
    #v2fColor =  np.array(map(int,value2.split(',')))
    v1fColor = np.array( value1.split(','), dtype=np.uint8 )
    v2fColor = np.array( value2.split(','), dtype=np.uint8 )
    print v1fColor
    print v2fColor
    cosineValue = cx(v1fColor, v2fColor)
    print cosineValue

if __name__ == '__main__':
    testFunction()

如果您运行此代码,您应该得到以下输出: 在此处输入图像描述

不允许取消注释两行并使用 David 的初始解决方案运行代码:

v1fColor = np.array(map(int,value1.split(',')))
v2fColor =  np.array(map(int,value2.split(','))) 

请记住,正如您在上面看到的那样,余弦相似度值高于 1.0,但是当我们使用 map 函数并使用 int 强制转换时,我们得到以下值,这是正确的值:

在此处输入图像描述

幸运的是,我正在绘制我最初得到的值,并且一些余弦值高于 1.0,我获取了这些向量的输出并在 python 控制台中手动输入它,并通过我的 lambda 函数发送它并得到了正确的答案所以我很困惑。然后我写了测试脚本来看看发生了什么,很高兴我发现了这个问题。我不是 python 专家,无法准确地用两种方法给出两个不同的答案。但我把它留给@David Robinson 或@mgilson。

于 2012-08-01T18:03:32.843 回答