5

很简单的问题,但我在网上找不到答案。我有一个Dataset,我只想为其添加一个名称DataArray。类似的东西dataset.add({"new_array": new_data_array})。我知道mergeand updateconcatenate但我的理解是merge用于合并两个或多个Datasets 并且concatenate用于连接两个或多个DataArrays 以形成另一个DataArray,我还没有完全理解update。我已经尝试过dataset.update({"new_array": new_data_array}),但出现以下错误。

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

我也试过dataset["new_array"] = new_data_array了,我得到了同样的错误。

更新

我现在发现问题是我的一些坐标有重复的值,这是我不知道的。坐标用作索引,因此 Xarray 在尝试组合共享坐标时会感到困惑(可以理解)。下面是一个有效的例子。

names = ["joaquin", "manolo", "xavier"]
n = xarray.DataArray([23, 98, 23], coords={"name": names})
print(n)
print("======")
m = numpy.random.randint(0, 256, (3, 4, 4)).astype(numpy.uint8)
mm = xarray.DataArray(m, dims=["name", "row", "column"], coords=[names, range(4), range(4)])
print(mm)
print("======")
n_dataset = n.rename("number").to_dataset()
n_dataset["mm"] = mm
print(n_dataset)

输出:

<xarray.DataArray (name: 3)>
array([23, 98, 23])
Coordinates:
  * name     (name) <U7 'joaquin' 'manolo' 'xavier'
======
<xarray.DataArray (name: 3, row: 4, column: 4)>
array([[[ 55,  63, 250, 211],
        [204, 151, 164, 237],
        [182,  24, 211,  12],
        [183, 220,  35,  78]],

       [[208,   7,  91, 114],
        [195,  30, 108, 130],
        [ 61, 224, 105, 125],
        [ 65,   1, 132, 137]],

       [[ 52, 137,  62, 206],
        [188, 160, 156, 126],
        [145, 223, 103, 240],
        [141,  38,  43,  68]]], dtype=uint8)
Coordinates:
  * name     (name) <U7 'joaquin' 'manolo' 'xavier'
  * row      (row) int64 0 1 2 3
  * column   (column) int64 0 1 2 3
======
<xarray.Dataset>
Dimensions:  (column: 4, name: 3, row: 4)
Coordinates:
  * name     (name) object 'joaquin' 'manolo' 'xavier'
  * row      (row) int64 0 1 2 3
  * column   (column) int64 0 1 2 3
Data variables:
    number   (name) int64 23 98 23
    mm       (name, row, column) uint8 55 63 250 211 204 151 164 237 182 24 ...

上面的代码names用作索引。如果我稍微更改一下代码,这样就会names有一个重复的,比如说names = ["joaquin", "manolo", "joaquin"],然后我会得到一个InvalidIndexError.

代码:

names = ["joaquin", "manolo", "joaquin"]
n = xarray.DataArray([23, 98, 23], coords={"name": names})
print(n)
print("======")
m = numpy.random.randint(0, 256, (3, 4, 4)).astype(numpy.uint8)
mm = xarray.DataArray(m, dims=["name", "row", "column"], coords=[names, range(4), range(4)])
print(mm)
print("======")
n_dataset = n.rename("number").to_dataset()
n_dataset["mm"] = mm
print(n_dataset)

输出:

<xarray.DataArray (name: 3)>
array([23, 98, 23])
Coordinates:
  * name     (name) <U7 'joaquin' 'manolo' 'joaquin'
======
<xarray.DataArray (name: 3, row: 4, column: 4)>
array([[[247,   3,  20, 141],
        [ 54, 111, 224,  56],
        [144, 117, 131, 192],
        [230,  44, 174,  14]],

       [[225, 184, 170, 248],
        [ 57, 105, 165,  70],
        [220, 228, 238,  17],
        [ 90, 118,  87,  30]],

       [[158, 211,  31, 212],
        [ 63, 172, 190, 254],
        [165, 163, 184,  22],
        [ 49, 224, 196, 244]]], dtype=uint8)
Coordinates:
  * name     (name) <U7 'joaquin' 'manolo' 'joaquin'
  * row      (row) int64 0 1 2 3
  * column   (column) int64 0 1 2 3
======
---------------------------------------------------------------------------
InvalidIndexError                         Traceback (most recent call last)
<ipython-input-12-50863379cefe> in <module>()
      8 print("======")
      9 n_dataset = n.rename("number").to_dataset()
---> 10 n_dataset["mm"] = mm
     11 print(n_dataset)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in __setitem__(self, key, value)
    536             raise NotImplementedError('cannot yet use a dictionary as a key '
    537                                       'to set Dataset values')
--> 538         self.update({key: value})
    539 
    540     def __delitem__(self, key):

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in update(self, other, inplace)
   1434             dataset.
   1435         """
-> 1436         variables, coord_names, dims = dataset_update_method(self, other)
   1437 
   1438         return self._replace_vars_and_dims(variables, coord_names, dims,

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/merge.py in dataset_update_method(dataset, other)
    492     priority_arg = 1
    493     indexes = dataset.indexes
--> 494     return merge_core(objs, priority_arg=priority_arg, indexes=indexes)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/merge.py in merge_core(objs, compat, join, priority_arg, explicit_coords, indexes)
    373     coerced = coerce_pandas_values(objs)
    374     aligned = deep_align(coerced, join=join, copy=False, indexes=indexes,
--> 375                          skip_single_target=True)
    376     expanded = expand_variable_dicts(aligned)
    377 

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in deep_align(list_of_variable_maps, join, copy, indexes, skip_single_target)
    162 
    163     aligned = partial_align(*targets, join=join, copy=copy, indexes=indexes,
--> 164                             skip_single_target=skip_single_target)
    165 
    166     for key, aligned_obj in zip(keys, aligned):

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in partial_align(*objects, **kwargs)
    122         valid_indexers = dict((k, v) for k, v in joined_indexes.items()
    123                               if k in obj.dims)
--> 124         result.append(obj.reindex(copy=copy, **valid_indexers))
    125 
    126     return tuple(result)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/dataset.py in reindex(self, indexers, method, tolerance, copy, **kw_indexers)
   1216 
   1217         variables = alignment.reindex_variables(
-> 1218             self.variables, self.indexes, indexers, method, tolerance, copy=copy)
   1219         return self._replace_vars_and_dims(variables)
   1220 

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/xarray/core/alignment.py in reindex_variables(variables, indexes, indexers, method, tolerance, copy)
    234             target = utils.safe_cast_to_index(indexers[name])
    235             indexer = index.get_indexer(target, method=method,
--> 236                                         **get_indexer_kwargs)
    237 
    238             to_shape[name] = len(target)

/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/pandas/indexes/base.py in get_indexer(self, target, method, limit, tolerance)
   2080 
   2081         if not self.is_unique:
-> 2082             raise InvalidIndexError('Reindexing only valid with uniquely'
   2083                                     ' valued Index objects')
   2084 

InvalidIndexError: Reindexing only valid with uniquely valued Index objects

所以这不是 Xarray 中的错误。尽管如此,我还是浪费了很多时间试图找到这个错误,我希望错误消息能提供更多信息。我希望 Xarray 的合作者能尽快解决这个问题。(在尝试合并之前对坐标进行唯一性检查。)

无论如何,我在下面的答案提供的方法仍然有效。

4

3 回答 3

8

感谢您的详细报告,此问题现已在最新版本的 xarray (v0.8.2) 中得到修复。

我们通过两种方式修复了该行为:

  1. 现在,即使使用非唯一索引,xarray 对象之间的对齐操作也会成功,只要非唯一索引在所有对象上采用相同的值。

  2. 如果您尝试将对象与相同的非唯一索引对齐,您现在会收到一条信息性错误消息,报告索引名称具有重复值,例如ValueError: cannot reindex or align along dimension 'x' because the index has duplicate values.

于 2016-08-20T02:05:15.980 回答
7

您需要确保新 DataArray 的维度与数据集中的维度相同。然后以下应该工作:

dataset['new_array_name'] = new_array

这是一个完整的示例来尝试一下:

# Create some dimensions
x = np.linspace(-10,10,10)
y = np.linspace(-20,20,20)
(yy, xx) = np.meshgrid(y,x)

# Make two different DataArrays with equal dimensions
var1 = xray.DataArray(np.random.randn(len(x),len(y)),coords=[x, y],dims=['x','y'])
var2 = xray.DataArray(-xx**2+yy**2,coords=[x, y],dims=['x','y'])

# Save one DataArray as dataset
ds = var1.to_dataset(name = 'var1')

# Add second DataArray to existing dataset (ds)
ds['var2'] = var2
于 2016-08-08T15:42:40.647 回答
1

好的,我找到了一种方法,但我不知道这是规范的方法还是最好的方法,所以请批评和建议。感觉这不是一个好方法。

dataset = xarray.merge([dataset, new_data_array.rename("new_array")])
于 2016-08-08T10:21:40.997 回答