我有一个通过 mysql 调用创建的 pandas dataFrame,它将数据作为对象类型返回。
数据主要是数字,带有一些“na”值。
如何转换 dataFrame 的类型,以便正确输入数值(浮点数)并将“na”值表示为 numpy NaN 值?
df = df.convert_objects(convert_numeric=True)
在大多数情况下都可以使用。
我应该注意,这会复制数据。最好在初始读取时将其转换为数字类型。如果你发布你的代码和一个小例子,有人可能会帮助你。
Use the replace method on dataframes:
import numpy as np
df = DataFrame({
'k1': ['na'] * 3 + ['two'] * 4,
'k2': [1, 'na', 2, 'na', 3, 4, 4]})
print df
df = df.replace('na', np.nan)
print df
I think it's helpful to point out that df.replace('na', np.nan) by itself won't work. You must assign it back to the existing dataframe.
This is what Tom suggested and is correct
In [134]: s = pd.Series(['1','2.','na'])
In [135]: s.convert_objects(convert_numeric=True)
Out[135]:
0 1
1 2
2 NaN
dtype: float64
As Andy points out, this doesn't work directly (I think that's a bug), so convert to all string elements first, then convert
In [136]: s2 = pd.Series(['1','2.','na',5])
In [138]: s2.astype(str).convert_objects(convert_numeric=True)
Out[138]:
0 1
1 2
2 NaN
3 5
dtype: float64