python - 将表导入 pandas 并指定缺失值的数据类型

Question

我在 pandas/Python 中使用 read_table 命令导入制表符分隔的文本文件。

q_data_1 = pd.read_table('data.txt', skiprows=6, dtype={'numbers': np.float64})

...但是得到

AttributeError: 'NoneType' object has no attribute 'dtype'

如果没有 dtype 参数，该列将作为“对象”dtype 导入。

我认为“数字”列缺少导致导入失败的数据。如何忽略这些值？

编辑（25-May-13）：知道如何使用包含（i）时间（例如'00:03:06'）（ii）日期（例如'2002-03-11'）和百分比（ '32.81%'）？所有这些都转换为对象。（我编辑了 Q 以反映）（iv）带逗号的数字（例如'10,982'）以将它们转换为适当的dtype？

score 1 · Accepted Answer

在您阅读 DataFrame（不限制 dtype）之后，您可以使用以下命令转换它（使用这篇文章中的技术）apply：

import locale
locale.setlocale( locale.LC_ALL, 'en_US.UTF-8')
df = pd.DataFrame([['1,002.01'], ['300,000,000.1'], ['10']], columns=['numbers'])

In [4]: df['numbers']
Out[4]:
0         1,002.01
1    300,000,000.1
2               10
Name: numbers, dtype: object

In [5]: df['numbers'].apply(locale.atof)
Out[5]:
0    1.002010e+03
1    3.000000e+08
2    1.000000e+01
Name: numbers, dtype: float64

In[6]: df['numbers'] = df['numbers'].apply(locale.atof)

python - 将表导入 pandas 并指定缺失值的数据类型

1 回答 1

Related

Reference