python - 如何使用python跳过用作列标题的文本

Question

我正在使用 Pandas 在 Python 中导入 Web 日志文本文件。Python 正在读取标题，但是使用文本“字段：”作为标题，然后在末尾添加了另一列空白（NaN）。如何停止将此文本用作列标题？

这是我的代码：

arr = pd.read_table("path", skiprows=3, delim_whitespace=True,      na_values=True)

这是文件的开头：

软件：Microsoft Internet Information Services 7.5

版本：1.0

日期：2014-08-01 00:00:25

字段：日期时间

2014-08-01 00:00:25...

结果是“字段”被用作列标题，并且正在为“时间”列创建一个充满 NaN 值的列。

score 1 · Accepted Answer

1

我想你可能想要skiprows = 4并且header = None

于 2016-04-08T04:33:16.433 回答

score 1 · Accepted Answer

你可以调用read_table两次。

# reads the forth line into 1x1 df being a string, 
# then splits it and skips the first field:
col_names = pd.read_table('path', skiprows=3, nrows=1, header=None).iloc[0,0].split()[1:]
# reads the actual data:
df = pd.read_table('path', sep=' ', skiprows=4, names=col_names)

如果您已经知道列的名称（例如date和time），那么它就更简单了：

df = pd.read_table('path', sep=' ', skiprows=4, names = ['date', 'time'])

python - 如何使用python跳过用作列标题的文本

2 回答 2

Related

Reference