Currently, my code heavily uses structured masked arrays with multidimensional dtypes, with dozens of fields and item sizes of many kilobytes. It appears that xarray
could be a great alternative, but when I try to pass it a masked array, it changes its dtype to float:
In [137]: x = arange(30, dtype="i1").reshape(3, 10)
In [138]: xr.Dataset({"count": (["x", "y"], ma.masked_where(x%5>3, x))}, coords={"x": range(3), "y":
...: range(10)})
Out[138]:
<xarray.Dataset>
Dimensions: (x: 3, y: 10)
Coordinates:
* y (y) int64 0 1 2 3 4 5 6 7 8 9
* x (x) int64 0 1 2
Data variables:
count (x, y) float64 0.0 1.0 2.0 3.0 nan 5.0 6.0 7.0 8.0 nan 10.0 ...
This is undesirable for me, because (1) the memory consumption of my dataset will explode (it is already large), and (2) many of my integer-dtypes are bit fields which must not be represented as floats. Although an int32
bitfield can be losslessly represented as a float64
, it's ugly and error-prone to go back and forth.
Is it possible to use xarray.Dataset
with masked arrays while preserving integer dtypes?
Edit: It appears the problem occurs in _maybe_promote
. See also github issue.