1

我想将天气数据加载到 BigQuery 中。我期待着将天气模式与我自己的数据集相关联。

4

1 回答 1

1

我有这个脚本可以将全球 NOAA 的每日 gsod 数据下载到 BigQuery 中:

#!/bin/bash
year=$1
# Folder for each year.
mkdir -p $year
# Get yearly data from NOAA.
wget ftp://ftp.ncdc.noaa.gov/pub/data/gsod/$year/gsod_$year.tar -O $year/gsod_$year.tar
# Untar one file per day.
tar -xvf $year/gsod_$year.tar -C $year/
# Archive not needed anymore.
rm $year/gsod_$year.tar
# Unzip each file.
find $year -name "*.gz" -print0 | xargs -0 gunzip
# Merge all files.
find $year -name "*.op" -print0 | xargs -0 grep -h -v STN  > $year.op
# Transform NOAA's format to csv.
# in2csv from https://csvkit.readthedocs.org/en/0.9.0/
# gsod_schema.csv from https://github.com/tothebeat/noaa-gsod-data-munging/
in2csv -s gsod_schema.csv $year.op > $year.csv
# Load into BigQuery.
bq load --max_bad_records 10 --replace weather_gsod.gsod$year $year.csv stn,wban,year,mo,da,temp:float,count_temp:integer,dewp:float,count_dewp:integer,slp:float,count_slp:integer,stp:float,count_stp:integer,visib:float,count_visib:integer,wdsp,count_wdsp,mxpsd,gust:float,max:float,flag_max,min:float,flag_min,prcp:float,flag_prcp,sndp:float,fog,rain_drizzle,snow_ice_pellets,hail,thunder,tornado_funnel_cloud

它会下载每年的 NOAA 档案、解包、解压缩每个文件,然后将特殊的 NOAA 编码转换为 BigQuery 可读的 CSV。

于 2015-01-27T01:22:04.727 回答