python - 如何使用 read_csv 中的 dtype 将 float 转换为 int？

Question

我有一个包含 5 列的 CSV 文件。

Id           Origin      Space       Empl1       Empl2
11084676.0   0.0         0.0         0.0         NaN
11084654.0   0.0         0.0         0.0         0.0
11084591.0   0.0         0.0         0.0         0.0

由于文件很大，我想避免类型的默认分配。因此，我想将以下类型分配给列：

Id        int
Origin    str
Space     str
Empl1     str
Empl2     str

我就是这样做的：

columns = ["Id", "Origin", "Space", "Empl1", "Empl2"]
types = ["int", "str", "str", "str", "str"]

df = pd.read_csv("myfile.csv", sep=';', header=0, dtype=dict(zip(columns, types)), usecols=columns, error_bad_lines=False, warn_bad_lines=True)

但问题是列Id包含浮点值：

TypeError：无法根据规则“安全”将数组从 dtype('float64') 转换为 dtype('int32')

反正有没有根据指定的数据类型强制转换？

score 1 · Accepted Answer

之后您可以尝试将类型转换为“Id”，例如

df['Id'] = pd.to_numeric(df['Id'], downcast='unsigned', errors='coerce')

score 0 · Accepted Answer

这实际上是两行

types_dct = dict(zip(columns, types))
del types_dict['Id']
df = pd.read_csv("myfile.csv", sep=';', header=0, dtype=types_dict, usecols=columns, error_bad_lines=False, warn_bad_lines=True)

python - 如何使用 read_csv 中的 dtype 将 float 转换为 int？

2 回答 2

Related

Reference