python - Pandas isin() 函数无法正确识别数字匹配

Question

isin()给了我奇怪的结果。我创建以下数据框：

import pandas as pd
import numpy as np

test=pd.DataFrame({'1': np.linspace(0.0, 1.0, 11)})

>>> test['1']
0     0.0
1     0.1
2     0.2
3     0.3
4     0.4
5     0.5
6     0.6
7     0.7
8     0.8
9     0.9
10    1.0
Name: 1, dtype: float64

使用（显然）相同的数组isin()现在给了我一些奇怪的东西。

>>> test['1'].isin([0. , 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1. ])
0      True
1      True
2      True
3     False
4      True
5      True
6     False
7     False
8      True
9      True
10     True
Name: 1, dtype: bool

我怀疑一些数字问题或与数据类型有关的东西。有人可以解释一下并告诉我如何预防吗？

score 1 · Accepted Answer

isin比较确切的值，因此在浮点值上使用它几乎不是一个好主意。可能存在不可见的浮点错误。例如，

for x in np.linspace(0.0,1.0,11): print(x)

给你：

0.0
0.1
0.2
0.30000000000000004
0.4
0.5
0.6000000000000001
0.7000000000000001
0.8
0.9
1.0

那说0.3你看到的test不是真的0.3。

score 1 · Accepted Answer

不，实际上是在正确识别它们。这更多地与 CPU 内部较低级别的物理有关（请参见此处），因此您需要小心这些事情：

print(test["1"].array)
<PandasArray>
[                0.0,                 0.1,                 0.2,
 0.30000000000000004,                 0.4,                 0.5,
  0.6000000000000001,  0.7000000000000001,                 0.8,
                 0.9,                 1.0]
Length: 11, dtype: float64

然而。

print(test['1'].isin(np.linspace(0.0,1.0,11)))
0     True
1     True
2     True
3     True
4     True
5     True
6     True
7     True
8     True
9     True
10    True
Name: 1, dtype: bool

score 1 · Accepted Answer

np.isclose当您想对浮点数进行“平等”检查时使用。使用广播进行所有比较并将np.logical_or.reduce结果组合成一个掩码，表明它“等于”任何元素。

import numpy as np
import pandas as pd

test = pd.DataFrame({'1': np.linspace(0.0, 1.1, 12)})
l = [0., 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.]
arr = np.array(l)  # So we can broadcast

test['in_l_close'] = np.logical_or.reduce(np.isclose(test['1'].to_numpy()[None, :], arr[:, None]))
test['in_l_naive'] = test['1'].isin(l)  #For comparision to show flaws.

print(test)

      1  in_l_close  in_l_naive
0   0.0        True        True
1   0.1        True        True
2   0.2        True        True
3   0.3        True       False
4   0.4        True        True
5   0.5        True        True
6   0.6        True       False
7   0.7        True       False
8   0.8        True        True
9   0.9        True        True
10  1.0        True        True
11  1.1       False       False

score 1 · Accepted Answer

只有当你这样做时它才会起作用：

test['1'] = test['1'].map(lambda x: '%.1f' % x)
print(test['1'].astype(np.float).isin([0.0, 0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9, 1.0 ]))

0     True
1     True
2     True
3     True
4     True
5     True
6     True
7     True
8     True
9     True
10    True

python - Pandas isin() 函数无法正确识别数字匹配

4 回答 4

Related

Reference