我正在遍历每月的气象站数据。我可以按如下方式连接文件:
path = r"D:\NOAA\output\TEST"
all_files = glob.glob(path + "/*.csv")
for filename in all_files:
print filename # prints D:\NOAA\output\TEST\189501.tave.conus.csv
df = (pd.read_csv(f) for f in all_files)
concatenated_df = pd.concat(df, axis=1, join='inner')
这导致以下数据框:
lat lon temp lat lon temp lat lon temp
0 24.5625 -81.8125 21.06 24.5625 -81.8125 17.08 24.5625 -81.8125 22.42
1 24.5625 -81.7708 21.06 24.5625 -81.7708 17.08 24.5625 -81.7708 22.47
2 24.5625 -81.7292 21.06 24.5625 -81.7292 17.08 24.5625 -81.7292 22.47
3 24.5625 -81.6875 21.05 24.5625 -81.6875 17.04 24.5625 -81.6875 22.47
4 24.6042 -81.6458 21.06 24.6042 -81.6458 17.08 24.6042 -81.6458 22.45
lat
和列是相同的lon
,所以我想删除那些重复的列。这些temp
列对于每个月度 CSV 文件都是唯一的。我想保留所有这些,但也给它们有意义的列名,取自文件名,即:
lat lon temp185901 temp185902 temp185903
0 24.5625 -81.8125 21.06 17.08 22.42
1 24.5625 -81.7708 21.06 17.08 22.47
2 24.5625 -81.7292 21.06 17.08 22.47
3 24.5625 -81.6875 21.05 17.04 22.47
4 24.6042 -81.6458 21.06 17.08 22.45
我是 Pandas 的新手(看起来很棒,但吸收了很多东西),我将不胜感激。我认为解决方案在我用于 、 和 的.concat()
参数.duplicate()
中.loc()
。
示例数据: ftp: //ftp.commissions.leg.state.mn.us/pub/gis/Temp/NOAA/