5

我是 python 和使用熊猫的新手。

我想查询一个数据框并过滤其中一列不是的行NaN

我试过了:

a=dictionarydf.label.isnull()

但 a 填充有trueor false。试过这个

dictionarydf.query(dictionarydf.label.isnull())

但正如我预期的那样给出了错误

样本数据:

      reference_word         all_matching_words  label review
0           account             fees - account    NaN      N
1           account           mobile - account    NaN      N
2           account          monthly - account    NaN      N
3    administration  delivery - administration    NaN      N
4    administration      fund - administration    NaN      N
5           advisor             fees - advisor    NaN      N
6           advisor          optimum - advisor    NaN      N
7           advisor              sub - advisor    NaN      N
8             aichi           delivery - aichi    NaN      N
9             aichi               pref - aichi    NaN      N
10          airport              biz - airport    travel      N
11          airport              cfo - airport    travel      N
12          airport           cfomtg - airport    travel      N
13          airport          meeting - airport    travel      N
14          airport           summit - airport    travel      N
15          airport             taxi - airport    travel      N
16          airport            train - airport    travel      N
17          airport         transfer - airport    travel      N
18          airport             trip - airport    travel      N
19              ais                admin - ais    NaN      N
20              ais               alpine - ais    NaN      N
21              ais                 fund - ais    NaN      N
22       allegiance       custody - allegiance    NaN      N
23       allegiance          fees - allegiance    NaN      N
24            alpha               late - alpha    NaN      N
25            alpha               meal - alpha    NaN      N
26            alpha               taxi - alpha    NaN      N
27           alpine             admin - alpine    NaN      N
28           alpine               ais - alpine    NaN      N
29           alpine              fund - alpine    NaN      N

我想过滤标签不是 NaN 的数据

预期输出:

     reference_word         all_matching_words   label    review
0          airport              biz - airport    travel      N
1          airport              cfo - airport    travel      N
2          airport           cfomtg - airport    travel      N
3          airport          meeting - airport    travel      N
4          airport           summit - airport    travel      N
5          airport             taxi - airport    travel      N
6          airport            train - airport    travel      N
7          airport         transfer - airport    travel      N
8          airport             trip - airport    travel      N
4

1 回答 1

6

您可以使用dropna

df = df.dropna(subset=['label'])

print (df)
   reference_word  all_matching_words   label review
10        airport       biz - airport  travel      N
11        airport       cfo - airport  travel      N
12        airport    cfomtg - airport  travel      N
13        airport   meeting - airport  travel      N
14        airport    summit - airport  travel      N
15        airport      taxi - airport  travel      N
16        airport     train - airport  travel      N
17        airport  transfer - airport  travel      N
18        airport      trip - airport  travel      N

另一个解决方案 -boolean indexing使用notnull

df = df[df.label.notnull()]

print (df)
   reference_word  all_matching_words   label review
10        airport       biz - airport  travel      N
11        airport       cfo - airport  travel      N
12        airport    cfomtg - airport  travel      N
13        airport   meeting - airport  travel      N
14        airport    summit - airport  travel      N
15        airport      taxi - airport  travel      N
16        airport     train - airport  travel      N
17        airport  transfer - airport  travel      N
18        airport      trip - airport  travel      N
于 2016-09-26T06:00:49.337 回答