3

I want to specify data types for pandas read_csv. Here's a quick look at something that does work and then doesn't when types are specified. Why doesn't the latter work?

import io
import pandas as pd

csv = """foo,1234567,a,1 
foo,2345678,b,3 
bar,3456789,b,5 
"""

df = pd.read_csv(io.StringIO(csv),
        names=["fb", "num", "loc", "x"])

print(df)

df = pd.read_csv(io.StringIO(csv),
        names=["fb", "num", "loc", "x"], 
        dtype=["|S3", "np.int64", "|S1", "np.int8"])

print(df)

I've updated to make this much simpler and, hopefully, clearer on BrenBarn's suggestion. My real dataset is much larger, but I'd like to use the method to generate types for all my data on import.

4

1 回答 1

5

正如 Jeff 所说,我的语法很糟糕。名称和类型必须压缩到 dic 样式的关系列表中。下面的代码有效,但请注意,您不能输入字符串宽度;您只能将其定义为对象。

import pandas as pd
import io

csv = """foo,1234567,a,1
foo,2345678,b,3
bar,3456789,b,5
"""

df = pd.read_csv(io.StringIO(csv),
        names = ["fb", "num", "ab", "x"], 
        dtype = {"fb" : object, "num" : np.int64, "ab" : object, "x" : np.int8})
print(df)
于 2013-09-30T00:31:32.163 回答