2

I have a pandas dataframe with a column of integers, which contains some nans. I want to convert them from integer to string, and replace the nans with a description like 'not available'.

The main reason is because I need to run groupbys on that column and, unless I convert the nans, the groupby will get rid of them! Why that even happens, and how the whole pandas community has not risen up in arms, is a totally separate discussion (when I first learnt about it I couldn't believe it...).

I have tried the code below, but it doesn't work. Note that I have tried both astype(str) and astype('str'). In both cases the column gets converted to object, not to string; maybe because Python assumes (wrongly, they all have the same length in my dataframe) that the length of the strings varies? But, most importantly, the fillna() doesn't work, and the nans stay nans! Why?

import numpy as np
import pandas as pd

df= pd.DataFrame(np.random.randint(1,10,(10000,5)), columns=['a','b','c','d','e'])
df.iloc[0,0]=np.nan
df['a']=df['a'].astype(str)
df['a']=df['a'].fillna('not available')
print(df.dtypes)
print(df.head())
4

2 回答 2

5

fillna will not work after you cast those values to 'str', you no longer have a np.nan in that column, but a string value 'nan':

df= pd.DataFrame(np.random.randint(1,10,(10000,5)), columns=['a','b','c','d','e'])
df.iloc[0,0]=np.nan
#df['a']=df['a'].astype(str) <-- You don't need this line.
df['a']=df['a'].fillna('not available')
print(df.dtypes)
print(df.head())

Output:

a    object
b     int32
c     int32
d     int32
e     int32
dtype: object
               a  b  c  d  e
0  not available  6  3  9  7
1              5  4  5  5  3
2              4  2  5  3  2
3              4  9  2  8  3
4              2  6  5  9  1
于 2017-10-31T16:06:31.420 回答
0
df= pd.DataFrame(np.random.randint(1,10,(10,5)), columns=['a','b','c','d','e'])
df.iloc[0,0]=np.nan

df.isnull()
Out[329]: 
       a      b      c      d      e
0   True  False  False  False  False
1  False  False  False  False  False
2  False  False  False  False  False
3  False  False  False  False  False
4  False  False  False  False  False
5  False  False  False  False  False
6  False  False  False  False  False
7  False  False  False  False  False
8  False  False  False  False  False
9  False  False  False  False  False

After you change to str

df['a']=df['a'].astype(str)

df.isnull()
Out[332]: 
       a      b      c      d      e
0  False  False  False  False  False
1  False  False  False  False  False
2  False  False  False  False  False
3  False  False  False  False  False
4  False  False  False  False  False
5  False  False  False  False  False
6  False  False  False  False  False
7  False  False  False  False  False
8  False  False  False  False  False
9  False  False  False  False  False

You change the null value which is np.nan to string 'nan'

df.iloc[0,0]
Out[334]: 'nan'
于 2017-10-31T16:39:33.217 回答