4

我有一个通过 mysql 调用创建的 pandas dataFrame,它将数据作为对象类型返回。

数据主要是数字,带有一些“na”值。

如何转换 dataFrame 的类型,以便正确输入数值(浮点数)并将“na”值表示为 numpy NaN 值?

4

3 回答 3

1

df = df.convert_objects(convert_numeric=True)在大多数情况下都可以使用。

我应该注意,这会复制数据。最好在初始读取时将其转换为数字类型。如果你发布你的代码和一个小例子,有人可能会帮助你。

于 2013-07-03T20:55:32.177 回答
1

Use the replace method on dataframes:

import numpy as np
df = DataFrame({
'k1': ['na'] * 3 + ['two'] * 4,
'k2': [1, 'na', 2, 'na', 3, 4, 4]})

print df

df = df.replace('na', np.nan)

print df

I think it's helpful to point out that df.replace('na', np.nan) by itself won't work. You must assign it back to the existing dataframe.

于 2013-07-03T21:00:11.977 回答
1

This is what Tom suggested and is correct

In [134]: s = pd.Series(['1','2.','na'])

In [135]: s.convert_objects(convert_numeric=True)
Out[135]: 
0     1
1     2
2   NaN
dtype: float64

As Andy points out, this doesn't work directly (I think that's a bug), so convert to all string elements first, then convert

In [136]: s2 = pd.Series(['1','2.','na',5])

In [138]: s2.astype(str).convert_objects(convert_numeric=True)
Out[138]: 
0     1
1     2
2   NaN
3     5
dtype: float64
于 2013-07-03T21:05:53.660 回答