1

我需要下载并处理澳大利亚气象局的天气文件。到目前为止,以下 Python 运行良好,它完全按照我的意愿提取和清理数据

import pandas as pd
df = pd.read_csv("ftp://ftp.bom.gov.au/anon/gen/fwo/IDY02122.dat", comment='#', skiprows=3, na_values=-9999.0, quotechar='"', skipfooter=1, names=['stn', 'per', 'evap', 'amax',   'amin',   'gmin',   'suns',   'rain',   'prob'], header=0, converters={'stn': str})

问题是文件每天都会被覆盖,指示预测生成日期和时间的元数据位于前两行的注释字段中,即文件包含以下数据

# date=20131111
# time=06
[fcst_DB]
stn[7]  , per,   evap,   amax,   amin,   gmin,   suns,   rain,   prob
"001006",   0,-9999.0,   39.9,-9999.0,-9999.0,-9999.0,    4.0,  100.0
"001006",   1,-9999.0,   39.4,   26.5,-9999.0,-9999.0,    6.0,  100.0
"001006",   2,-9999.0,   35.5,   26.2,-9999.0,-9999.0,    7.0,  100.0

是否可以使用 pandas 在结果中包含前两行。理想情况下,通过在结果中添加日期和时间列并为输出中的每一行使用值 20131111 和 06。

问候戴夫

4

1 回答 1

1

前两行是否总是日期和时间?在这种情况下,我建议单独解析这些并将其余的流交给 read_csv。

import urllib2
r = urllib2.urlopen(url)

In [29]: r = urllib2.urlopen(url)

In [30]: date = next(r).strip('# date=').rstrip()

In [31]: time = next(r).strip('# time=').rstrip()

In [32]: stamp = pd.to_datetime(x + ' ' + time)

In [33]: stamp
Out[33]: Timestamp('2013-11-12 00:00:00', tz=None)

然后使用您的代码阅读(我将其更改skiprows为 1)

In [34]: df = pd.read_csv("ftp://ftp.bom.gov.au/anon/gen/fwo/IDY02122.dat", comment='#',
             skiprows=1, na_values=-9999.0, quotechar='"', skipfooter=1,
             names=['stn', 'per', 'evap', 'amax', 'amin', 'gmin', 'suns',
                    'rain',   'prob'], header=0, converters={'stn': str})

In [43]: df['timestamp'] = stamp

In [44]: df.head()
Out[44]: 
      stn  per  evap  amax  amin  gmin  suns  rain   prob           timestamp
0  001006    0   NaN  39.9   NaN   NaN   NaN   2.9  100.0 2013-11-12 00:00:00
1  001006    1   NaN  35.8  25.8   NaN   NaN   7.0  100.0 2013-11-12 00:00:00
2  001006    2   NaN  37.0  25.5   NaN   NaN   4.0   71.4 2013-11-12 00:00:00
3  001006    3   NaN  39.0  26.0   NaN   NaN   1.0   60.0 2013-11-12 00:00:00
4  001006    4   NaN  41.2  26.1   NaN   NaN   0.0   40.0 2013-11-12 00:00:00
于 2013-11-12T04:42:47.323 回答