python - python read_fwf 错误：“python-fwf 解析器不支持 dtype”

Question

使用 python 2.7.5 和 pandas 0.12.0，我正在尝试使用“pd.io.parsers.read_fwf()”将固定宽度字体的文本文件导入 DataFrame。我要导入的值都是数字，但保留前导零很重要，所以我想将 dtype 指定为字符串而不是 int。

根据此函数的文档，read_fwf 支持 dtype 属性，但是当我尝试使用它时：

data= pd.io.parsers.read_fwf(file, colspecs = ([79,81], [87,90]), header = None, dtype = {0: np.str, 1: np.str})

我得到错误：

ValueError: dtype is not supported with python-fwf parser

我已经尝试了尽可能多的变体来设置'dtype = something'，但它们都返回相同的消息。

任何帮助将非常感激！

score 8 · Accepted Answer

以@TomAugspurger 的示例为基础，为要保留为str 的列指定一个转换器，而不是指定dtypes：

from io import StringIO
import pandas as pd
data = StringIO(u"""
121301234
121300123
121300012
""")

pd.read_fwf(data, colspecs=[(0,3),(4,8)], converters = {1: str})

导致

    \n Unnamed: 1
0  121       0123
1  121       0012
2  121       0001

转换器是从列名或索引到函数的映射，用于转换单元格中的值（例如，int 会将它们转换为整数，将浮点数转换为浮点数等）

score 4 · Accepted Answer

那里的文档可能不正确。我认为几个读者使用相同的基本文档字符串。至于解决方法，由于您提前知道宽度，我认为您可以事后预先添加零。

使用此文件和宽度 [4, 5]

121301234
121300123
121300012

我们得到：

In [38]: df = pd.read_fwf('tst.fwf', widths=[4,5], header=None)

In [39]: df
Out[39]: 
      0     1
0  1213  1234
1  1213   123
2  1213    12

要填补缺失的零，这行得通吗？

In [45]: df[1] = df[1].astype('str')

In [53]: df[1] = df[1].apply(lambda x: ''.join(['0'] * (5 - len(x))) + x)

In [54]: df
Out[54]: 
      0      1
0  1213  01234
1  1213  00123
2  1213  00012

上面 lambda 中的 5 来自正确的宽度。您需要选择所有需要前导零的列并将函数（具有正确的宽度）应用于每个列。

score 1 · Accepted Answer

这将在pandas 0.20.2版本之后正常工作。

from io import StringIO
import pandas as pd
import numpy as np
data = StringIO(u"""
121301234
121300123
121300012
""")
pd.read_fwf(data, colspecs=[(0,3),(4,8)], header = None, dtype = {0: np.str, 1: np.str})

输出：

     0     1
0  NaN   NaN
1  121  0123
2  121  0012
3  121  0001

python - python read_fwf 错误：“python-fwf 解析器不支持 dtype”

3 回答 3

Related

Reference