python - 转换 Pandas 数据框类型

Question

我有一个通过 mysql 调用创建的 pandas dataFrame，它将数据作为对象类型返回。

数据主要是数字，带有一些“na”值。

如何转换 dataFrame 的类型，以便正确输入数值（浮点数）并将“na”值表示为 numpy NaN 值？

score 1 · Accepted Answer

df = df.convert_objects(convert_numeric=True)在大多数情况下都可以使用。

我应该注意，这会复制数据。最好在初始读取时将其转换为数字类型。如果你发布你的代码和一个小例子，有人可能会帮助你。

score 1 · Accepted Answer

Use the replace method on dataframes:

import numpy as np
df = DataFrame({
'k1': ['na'] * 3 + ['two'] * 4,
'k2': [1, 'na', 2, 'na', 3, 4, 4]})

print df

df = df.replace('na', np.nan)

print df

I think it's helpful to point out that df.replace('na', np.nan) by itself won't work. You must assign it back to the existing dataframe.

score 1 · Accepted Answer

This is what Tom suggested and is correct

In [134]: s = pd.Series(['1','2.','na'])

In [135]: s.convert_objects(convert_numeric=True)
Out[135]: 
0     1
1     2
2   NaN
dtype: float64

As Andy points out, this doesn't work directly (I think that's a bug), so convert to all string elements first, then convert

In [136]: s2 = pd.Series(['1','2.','na',5])

In [138]: s2.astype(str).convert_objects(convert_numeric=True)
Out[138]: 
0     1
1     2
2   NaN
3     5
dtype: float64

python - 转换 Pandas 数据框类型

3 回答 3

Related

Reference