7

I have a series of monthly gridded datasets in CSV form. I want to read them, add a few dimensions, and then write to netcdf. I've had great experience using xarray (xray) in the past so thought I'd use if for this task.

I can easily get them into a 2D DataArray with something like:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
da = xr.DataArray(data, coords=coords)

But when I try to add another dimension, which would convey information about time (all data is from the same year/month), things start to go sour.

I've tried two ways to crack this:

1) expand my input data to m x n x 1, something like:

data = np.ones((360,720))
lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
data = data[:,:,np.newaxis]

Then I follow the same steps as above, with coords updated to contain a third dimension.

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng':lngs}
coords['time'] = pd.datetime(year, month, day))
da = xr.DataArray(data, coords=coords)
da.to_dataset(name='variable_name')

This is fine for creating a DataArray -- but when I try to convert to a dataset (so I can write to netCDF), I get an error about 'ValueError: Coordinate objects must be 1-dimensional'

2) The second approach I've tried is taking my dataarray, casting it to a dataframe, setting the index to ['lat','lng', 'time'] and then going back to a dataset with xr.Dataset.from_dataframe(). I've tried this -- but it takes 20+ min before I kill the process.

Does anyone know how I can get a Dataset with a monthly 'time' dimension?

4

2 回答 2

14

您的第一个示例非常接近:

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs}
coords['time'] = [datetime.datetime(year, month, day)]
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng', 'time'])
da.to_dataset(name='variable_name')

你会注意到我的版本有一些变化:

  1. 我传递了第一个“时间”坐标而不是标量。您需要传入一个列表或一维数组来获取一维坐标变量,如果您还使用“时间”作为维度,这就是您所需要的。这就是错误ValueError: Coordinate objects must be 1-dimensional试图告诉您的内容(顺便说一下——如果您对如何使该错误消息更有帮助有想法,我会全力以赴!)。
  2. 我正在为dimsDataArray 构造函数提供一个参数。传入(无序)字典有点危险,因为不能保证迭代顺序。
  3. 我也切换到datetime.datetime而不是pd.datetime. 后者只是前者的别名。

另一种明智的方法是concat在将“时间”添加为标量坐标后使用一个项目的列表,例如,

lats = np.arange(-89.75, 90, 0.5) * -1
lngs = np.arange(-179.75, 180, 0.5)
coords =  {'lat': lats, 'lng': lngs, 'time': datetime.datetime(year, month, day)}
da = xr.DataArray(data, coords=coords, dims=['lat', 'lng'])
expanded_da = xr.concat([da], 'time')

这个版本很好地概括了将一堆天的数据连接在一起——您只需使 DataArrays 列表更长。以我的经验,大多数时候,您首先想要额外维度的原因是能够沿着它连接。否则,长度 1 尺寸不是很有用。

于 2016-05-12T17:02:09.893 回答
3

您可以使用.expand_dims()来添加新尺寸并.assign_coords()为相应尺寸添加坐标值。下面的代码将new_dim维度添加到数据集并使用您提供ds的设置相应的坐标。list_of_values

expanded_ds = ds.expand_dims("new_dim").assign_coords(new_dim=("new_dim", [list_of_values]))
于 2020-07-21T15:01:28.277 回答