7

Please explain how NaN's are treated in pandas because the following logic seems "broken" to me, I tried various ways (shown below) to drop the empty values.

My dataframe, which I load from a CSV file using read.csv, has a column comments, which is empty most of the time.

The column marked_results.comments looks like this; all the rest of the column is NaN, so pandas loads empty entries as NaNs, so far so good:

0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN
....

Now I try to drop those entries, only this works:

  • marked_results.comments.isnull()

All these don't work:

  • marked_results.comments.dropna() only gives the same column, nothing gets dropped, confusing.
  • marked_results.comments == NaN only gives a series of all Falses. Nothing was NaNs... confusing.
  • likewise marked_results.comments == nan

I also tried:

comments_values = marked_results.comments.unique()

array(['VP', 'TEST', nan], dtype=object)

# Ah, gotya! so now ive tried:
marked_results.comments == comments_values[2]
# but still all the results are Falses!!!
4

2 回答 2

15

您应该使用isnullandnotnull来测试 NaN(使用 pandas dtypes 比使用 numpy 更健壮),请参阅文档中的“认为缺失的值”

在列上使用 Series 方法dropna不会影响原始数据框,但可以执行您想要的操作:

In [11]: df
Out[11]:
  comments
0       VP
1       VP
2       VP
3     TEST
4      NaN
5      NaN

In [12]: df.comments.dropna()
Out[12]:
0      VP
1      VP
2      VP
3    TEST
Name: comments, dtype: object

DataFrame方法有一个子集参数(删除在特定列中具有 NaN 的行):dropna

In [13]: df.dropna(subset=['comments'])
Out[13]:
  comments
0       VP
1       VP
2       VP
3     TEST

In [14]: df = df.dropna(subset=['comments'])
于 2013-07-31T12:18:21.453 回答
7

您需要NaN使用math.isnan()函数(或numpy.isnan)进行测试。不能用相等运算符检查 NaN。

>>> a = float('NaN')
>>> a
nan
>>> a == 'NaN'
False
>>> isnan(a)
True
>>> a == float('NaN')
False

帮助功能->

isnan(...)
    isnan(x) -> bool

    Check if float x is not a number (NaN).
于 2013-07-31T12:04:38.390 回答