python - Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?

Question

Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.

My dataframe, which I load from a CSV file using read.csv, has a column comments, which is empty most of the time.

The column marked_results.comments looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:

0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN
....

Now I try to drop those entries, only this works:

marked_results.comments.isnull()

All these don't work:

marked_results.comments.dropna() only gives the same column, nothing gets dropped, confusing.
marked_results.comments == NaN only gives a series of all Falses. Nothing was NaNs... confusing.
likewise marked_results.comments == nan

I also tried:

comments_values = marked_results.comments.unique()

array(['VP', 'TEST', nan], dtype=object)

# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!

score 15 · Accepted Answer

您应该使用isnullandnotnull来测试 NaN（使用 pandas dtypes 比使用 numpy 更健壮），请参阅文档中的“认为缺失的值”。

在列上使用 Series 方法dropna不会影响原始数据框，但可以执行您想要的操作：

In [11]: df
Out[11]:
  comments
0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN

In [12]: df.comments.dropna()
Out[12]:
0      VP
1      VP
2      VP
3    TEST
Name: comments, dtype: object

DataFrame方法有一个子集参数（删除在特定列中具有 NaN 的行）：dropna

In [13]: df.dropna(subset=['comments'])
Out[13]:
  comments
0       VP
1       VP
2       VP
3     TEST

In [14]: df = df.dropna(subset=['comments'])

score 7 · Accepted Answer

您需要NaN使用math.isnan()函数（或numpy.isnan）进行测试。不能用相等运算符检查 NaN。

>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False

帮助功能->

isnan(...)
    isnan(x) -> bool

    Check if float x is not a number (NaN).

python - Why does testing `NaN == NaN` not work for dropping from a pandas dataFrame?

2 回答 2

Related

Reference