python - xarray - store strings as 'string' data-type instead of 'char' (n-dimensional character arrays) for Python2.7

Question

I am converting a text file to netCDF format using xarray. When I am using netCDF4 format and Python3, it is storing string variables as strings but when I use Python2 it stores them as n-dimensional character arrays. I have tried to set dtype='str' in encoding and that didn't make any difference. Is there a way to make these variables to have string data-type using Python2? Any thoughts would be appreciated.

Here is my code:

import pandas as pd
import xarray as xr

column_names = ['timestamp', 'air_temp', 'vtempdiff', 'rh', 'pressure', 'wind_dir', 'wind_spd']

df = pd.read_csv(args.input_file, skiprows = 1, header=None, names = column_names)
ds = xr.Dataset.from_dataframe(df)

encoding = {'timestamp': {'dtype': 'str'},
            'air_temp': {'_FillValue': 9.96921e+36, 'dtype': 'f4'}
            }

ds.to_netcdf(op_file.nc, format = 'NETCDF4', unlimited_dims={'time':True}, encoding = encoding)

When I do ncdump of the op_file.nc using Python3.6, I get:

netcdf op_file {
dimensions:
    time = UNLIMITED ; // (24 currently)
variables:
    string timestamp(time) ;
    float air_temp(time) ;
    .
    .
    .

And when I use Python2.7, I get:

netcdf op_file {
dimensions:
    time = UNLIMITED ; // (24 currently)
    string20 = 20 ;
variables:
    char timestamp(time, string20) ;
        timestamp:_Encoding = "utf-8" ;
    float air_temp(time) ;
    .
    .
    .

The sample input file looks like this:

# Fields: stamp,AGO-4.air_temp,AGO-4.vtempdiff,AGO-4.rh,AGO-4.pressure,AGO-4.wind_dir,AGO-4.wind_spd
2016-11-30T00:00:00Z,-36.50,,56.00,624.60,269.00,5.80
2016-11-30T01:00:00Z,-35.70,,55.80,624.70,265.00,5.90

score 5 · Accepted Answer

Xarray 将 Python 2 的str/bytes类型映射到 NetCDF 的NC_CHAR类型。这两种类型都表示单字节字符数据（通常是 ASCII），因此这具有一定的意义。

要获取 netCDF 字符串NC_STRING，您需要传递unicode数据（str在 Python 3 上）。您可以通过使用或.astype(unicode)传入{'dtype': unicode}.encoding

python - xarray - store strings as 'string' data-type instead of 'char' (n-dimensional character arrays) for Python2.7

1 回答 1

Related

Reference