0

我有一个函数可以根据模型的各种输出创建一个 xarray 数据集。我收集的信息之一是列表列表(长度不同)。这个变量被称为并且与其他变量cids具有相同的维度。repo_id

到目前为止,以下一直运行良好:

datetime = pd.date_range('20010101', periods=100, freq='D')
obs = [xr.DataArray(np.random.rand(100), dims={'datetime': datetime}),xr.DataArray(np.random.rand(100), dims={'datetime':datetime}) ]
cids = [[1, 2, 3], [1, 2, 3, 4]]
keys = np.array([['A', 'A', 'B'], ['C', 'D', 'E']])
xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)},  coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})

正如预期的那样,这会产生以下结果:

<xarray.Dataset>
Dimensions:   (datetime: 100, repo_id: 2)
Coordinates:
  * repo_id   (repo_id) <U1 'A' 'D'
  * datetime  (datetime) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99
Data variables:
    obs       (repo_id, datetime) float64 0.9393 0.468 0.7168 ... 0.03513 0.8771
    cig_id    (repo_id) <U1 'A' 'C'
    repo      (repo_id) <U1 'B' 'E'
    cids      (repo_id) object [1, 2, 3] [1, 2, 3, 4]

但是,我最近遇到了一个情况,我的变量中的列表长度cids是相同的:

datetime = pd.date_range('20010101', periods=100, freq='D')
obs = [xr.DataArray(np.random.rand(100), dims={'datetime': datetime}),xr.DataArray(np.random.rand(100), dims={'datetime':datetime}) ]
# see here that length of elements in cids are both equal
cids = [[1, 2, 3], [1, 2, 3]]
keys = np.array([['A', 'A', 'B'], ['C', 'D', 'E']])
xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)},  coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})

这会产生以下错误:

cids = [[1, 2, 3], [1, 2, 3]]
keys = np.array([['A', 'A', 'B'], ['C', 'D', 'E']])
xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)},  coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})
Traceback (most recent call last):
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/variable.py", line 107, in as_variable
    obj = Variable(*obj)
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/variable.py", line 309, in __init__
    self._dims = self._parse_dimensions(dims)
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/variable.py", line 503, in _parse_dimensions
    "number of data dimensions, ndim=%s" % (dims, self.ndim)
ValueError: dimensions ('repo_id',) must have the same length as the number of data dimensions, ndim=2
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3331, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)
  File "<ipython-input-48-9a2b518ac4d3>", line 2, in <module>
    xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)},  coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/dataset.py", line 537, in __init__
    data_vars, coords, compat="broadcast_equals"
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/merge.py", line 467, in merge_data_and_coords
    objects, compat, join, explicit_coords=explicit_coords, indexes=indexes
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/merge.py", line 552, in merge_core
    collected = collect_variables_and_indexes(aligned)
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/merge.py", line 277, in collect_variables_and_indexes
    variable = as_variable(variable, name=name)
  File "/auto/anaconda3/envs/commod_staging/lib/python3.6/site-packages/xarray/core/variable.py", line 113, in as_variable
    "{} to Variable.".format(obj)
ValueError: Could not convert tuple of form (dims, data[, attrs, encoding]): ('repo_id', [[1, 2, 3], [1, 2, 3]]) to Variable.

输入将不胜感激,不知道如何最好地处理这个。似乎 xarray 试图变得聪明,并假设的维度cids不是repo_id长度为 2,而是长度为 3 ......一个错误?

4

2 回答 2

1

我怀疑这可能不是最“xarrayonic”的方法,但以下似乎为我提供了一个“修复”:

datetime = pd.date_range('20010101', periods=100, freq='D')
obs = [xr.DataArray(np.random.rand(100), dims={'datetime': datetime}),xr.DataArray(np.random.rand(100), dims={'datetime':datetime}) ]
# see here that length of elements in cids are both equal
## HERE IS THE FIX, CONVERT THEM TO SETS
cids = [set(_e) for _e in [[1, 2, 3], [1, 2, 3]]]

## THAT'S ALL
keys = np.array([['A', 'A', 'B'], ['C', 'D', 'E']])
xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)},  coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})
于 2020-04-17T21:27:29.167 回答
0

目前,第一个示例创建了一个cids包含列表的变量:

In [6]: datetime = pd.date_range('20010101', periods=100, freq='D')
   ...: obs = [xr.DataArray(np.random.rand(100), dims={'datetime': datetime}),xr.DataArray(np.random.rand(100), dims={'datetime':datetime}) ]
   ...: cids = [[1, 2, 3], [1, 2, 3, 4]]
   ...: keys = np.array([['A', 'A', 'B'], ['C', 'D', 'E']])
   ...: xr.Dataset({'obs': (['repo_id', 'datetime'], np.array(obs)), 'cig_id': ('repo_id', keys[:, 0]), 'repo': ('repo_id', keys[:, 2]), 'cids': ('repo_id', cids)},  coords={'repo_id': keys[:, 1], 'datetime': obs[0].datetime})
   ...:
Out[6]:
<xarray.Dataset>
Dimensions:   (datetime: 100, repo_id: 2)
Coordinates:
  * repo_id   (repo_id) <U1 'A' 'D'
  * datetime  (datetime) int64 0 1 2 3 4 5 6 7 8 ... 91 92 93 94 95 96 97 98 99
Data variables:
    obs       (repo_id, datetime) float64 0.4451 0.9134 ... 0.8266 0.07039
    cig_id    (repo_id) <U1 'A' 'C'
    repo      (repo_id) <U1 'B' 'E'
    cids      (repo_id) object [1, 2, 3] [1, 2, 3, 4]

In [9]: ds=_

In [11]: ds.cids
Out[11]:
<xarray.DataArray 'cids' (repo_id: 2)>
array([list([1, 2, 3]), list([1, 2, 3, 4])], dtype=object)  # <- here
Coordinates:
  * repo_id  (repo_id) <U1 'A' 'D'

这是故意的吗?通常,您希望在每个维度上存储一个值,而不是一个列表。

我很欣赏这是一对令人困惑的案例,因为令人惊讶的是它适用于大小不等的列表,但不适用于同样大小的列表。Xarray 试图将列表中的值沿另一个维度放置,并且缺少一个额外的维度;而不是尝试为大小不等的列表执行此操作。

错误信息很糟糕。但我不确定我会在功能上进行哪些更改:它可能会在您的第一个示例中引发错误,因为不太可能有人想要列表对象。

于 2020-04-11T21:29:58.040 回答