python - 合并大量netCDF文件

Question

我有一个很大的 netCDF (.nc) 文件文件夹，每个文件的名称相似。数据文件包含时间、经度、纬度和月降水量的变量。目标是获得每个月 X 年的平均月降水量。因此，最后我将有 12 个值代表每个纬度和经度 X 年的平均月降水量。多年来，每个文件都位于同一位置。每个文件都以相同的名称开头并以“date.sub.nc”结尾，例如：

'data1.somthing.somthing1.avg_2d_Ind_Nx.200109.SUB.nc'
'data1.somthing.somthing1.avg_2d_Ind_Nx.200509.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201104.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201004.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201003.SUB.nc'
'data2.somthing.somthing1.avg_2d_Ind_Nx.201103.SUB.nc'
'data1.somthing.somthing1.avg_2d_Ind_Nx.201203.SUB.nc'

结尾是 YearMonth.SUB.nc 我到目前为止是：

array=[]
f = nc.MFDataset('data*.nc')
precp = f.variables['prectot']
time = f.variables['time']
array = f.variables['time','longitude','latitude','prectot']

我得到一个 KeyError: ('time', 'longitude', 'latitude', 'prectot')。有没有办法组合所有这些数据，以便我能够操纵它？

score 5 · Accepted Answer

正如@CharlieZender 提到ncra的那样，这是一种方法，我将提供一些关于将该函数集成到 Python 脚本中的更多细节。（PS - 您可以使用 Homebrew 轻松安装 NCO，例如http://alejandrosoto.net/blog/2014/01/22/setting-up-my-mac-for-scientific-research/）

import subprocess
import netCDF4
import glob
import numpy as np

for month in range(1,13):
    # Gather all the files for this month
    month_files = glob.glob('/path/to/files/*{0:0>2d}.SUB.nc'.format(month))


    # Using NCO functions ---------------
    avg_file = './precip_avg_{0:0>2d}.nc'.format(month)

    # Concatenate the files using ncrcat
    subprocess.call(['ncrcat'] + month_files + ['-O', avg_file])

    # Take the time (record) average using ncra 
    subprocess.call(['ncra', avg_file, '-O', avg_file])

    # Read in the monthly precip climatology file and do whatever now
    ncfile = netCDF4.Dataset(avg_file, 'r')
    pr = ncfile.variables['prectot'][:,:,:]
    ....

    # Using only Python -------------
    # Initialize an array to store monthly-mean precip for all years
    # let's presume we know the lat and lon dimensions (nlat, nlon)
    nyears = len(month_files)
    pr_arr = np.zeros([nyears,nlat,nlon], dtype='f4')

    # Populate pr_arr with each file's monthly-mean precip
    for idx, filename in enumerate(month_files):
        ncfile = netCDF4.Dataset(filename, 'r')
        pr = ncfile.variable['prectot'][:,:,:]  
        pr_arr[idx,:,:] = np.mean(pr, axis=0)
        ncfile.close()

    # Take the average along all years for a monthly climatology
    pr_clim = np.mean(pr_arr, axis=0)  # 2D now [lat,lon]

score 3 · Accepted Answer

NCO 这样做

ncra *.01.SUB.nc pcp_avg_01.nc
ncra *.02.SUB.nc pcp_avg_02.nc
...
ncra *.12.SUB.nc pcp_avg_12.nc
ncrcat pcp_avg_??.nc pcp_avg.nc

当然，前十二个命令可以使用 Bash 循环来完成，从而将总行数减少到少于五行。如果你更喜欢用 python 编写脚本，你可以用这个来检查你的答案。ncra文档在这里。

score 1 · Accepted Answer

命令ymonmean计算 CDO 中日历月的平均值。因此，该任务可以分两行完成：

cdo mergetime data*.SUB.nc  merged.nc  # put files together into one series
cdo ymonmean merged.nc annual_cycle.nc # mean of all Jan,Feb etc.

cdo can also calculate the annual cycle of other statistics, ymonstd, ymonmax etc... and the time units can be days or pentads as well as months. (e.g. ydaymean).

python - 合并大量netCDF文件

3 回答 3

Related

Reference