0

I have a numpy array of Python object. I want to compare the array against a python object and I don't want the comparison with == operator, but just a reference comparison is enough for my requirements.

import numpy as np
a = np.array(["abc", "def"], dtype="object")
a == "abc"

I am sure about my array that reference copy is enough. Let's say all strings, I have in my array are interned.

This is primarily to improve the performance when comparing zillion values. Python object comparisons are really slow.

a is "abc" won't do what I want because

In [1]: import numpy as np

In [2]: a = np.array(["abc", "def"], dtype="object")

In [3]: a == "abc"
Out[3]: array([ True, False], dtype=bool)

In [4]: a is "abc"
Out[4]: False

I want the result of a == "abc" but I don't Python's __eq__ method be used for the same but just the is operator.

4

2 回答 2

3

参考比较足以满足我的要求

要比较对象身份,请使用is代替==

if a is b:
   ...

文档中:

对象身份的运算符is和测试:当且仅当和是同一个对象时为真。产生逆真值。is notx is yxyx is not y

编辑:要应用于is数组的每个元素,您可以使用:

In [6]: map(lambda x:x is "abc", a)
Out[6]: [True, False]

或者简单地说:

In [9]: [x is "abc" for x in a]
Out[9]: [True, False]
于 2012-05-02T16:18:15.047 回答
0

np.vectorize怎么样:

vector_is = np.vectorize(lambda x, y: x is y, otypes=[bool])

然后你有

>>> a = np.array(["abc", "def"], dtype="object")

>>> vector_is(a, "abc")
array([ True, False], dtype=bool)

不幸的是,我不知道你是否可以operator.is_在这里使用,因为我得到

ValueError: failed to determine the number of arguments for <built-in function is_>

这似乎比列表理解稍慢(可能是因为lambda调用),尽管它的优点是在它接受的参数方面更加灵活:

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'vector_is(a, "abcd")'
10 loops, best of 3: 28.3 msec per loop

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' '[x is "abcd" for x in a]'
100 loops, best of 3: 20 msec per loop

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'np.fromiter((x is "abcd" for x in a), bool, len(a))'
10 loops, best of 3: 23.8 msec per loop

最后np.fromiter((x is "abcd" for x in a), bool, len(a))一种方法是从列表理解方法中获取 numpy 数组的一种方法。

不幸的是,所有这些都比仅使用要慢得多==

python -mtimeit -s 'import numpy as np' -s 'import random, string' -s 'a = np.array(["".join(random.choice(string.ascii_lowercase) for x in range(4)) for e in range(100000)])' -s 'vector_is = np.vectorize(lambda x,y: x is y, otypes=[bool])' 'a == "abcd"'                                        
1000 loops, best of 3: 1.42 msec per loop
于 2012-05-02T16:51:20.893 回答