python - 如何更改 numpy recarray 某些列的 dtype？

Question

假设我有一个如下的recarray：

import numpy as np

# example data from @unutbu's answer
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')

print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]

假设我想将某些列转换为浮点数。我该怎么做呢？我应该更改为 ndarray 并将它们更改回 rearray 吗？

score 16 · Accepted Answer

基本上有两个步骤。我的绊脚石是找到如何修改现有的 dtype。我是这样做的：

# change dtype by making a whole new array
dt = data.dtype
dt = dt.descr # this is now a modifiable list, can't modify numpy.dtype
# change the type of the first col:
dt[0] = (dt[0][0], 'float64')
dt = numpy.dtype(dt)
# data = numpy.array(data, dtype=dt) # option 1
data = data.astype(dt)

score 16 · Accepted Answer

这是一个astype用于执行转换的示例：

import numpy as np
recs = [('Bill', '31', 260.0), ('Fred', 15, '145.0')]
r = np.rec.fromrecords(recs, formats = 'S30,i2,f4', names = 'name, age, weight')
print(r)
# [('Bill', 31, 260.0) ('Fred', 15, 145.0)]

是agedtype <i2：

print(r.dtype)
# [('name', '|S30'), ('age', '<i2'), ('weight', '<f4')]

我们可以将其更改为<f4使用astype：

r = r.astype([('name', '|S30'), ('age', '<f4'), ('weight', '<f4')])
print(r)
# [('Bill', 31.0, 260.0) ('Fred', 15.0, 145.0)]

score 0 · Accepted Answer

这是对现有答案的一个小改进，以及对您想要根据 dtype 而不是列名进行更改的情况的扩展（例如，将所有浮点数更改为整数）。

首先，您可以通过使用 listcomp 来提高简洁性和可读性：

col       = 'age'
new_dtype = 'float64'

r.astype( [ (col, new_dtype) if d[0] == col else d for d in r.dtype.descr ] )

# rec.array([(b'Bill', 31.0, 260.0), (b'Fred', 15.0, 145.0)], 
#           dtype=[('name', 'S30'), ('age', '<f8'), ('weight', '<f4')])

其次，您可以扩展此语法以处理要将所有浮点数更改为整数（反之亦然）的情况。例如，如果您想将任何 32 位或 64 位浮点数更改为 64 位整数，您可以执行以下操作：

old_dtype = ['<f4', '<f8']
new_dtype = 'int64'

r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ] )

# rec.array([(b'Bill', 31, 260), (b'Fred', 15, 145)], 
#           dtype=[('name', 'S30'), ('age', '<i2'), ('weight', '<i8')])

请注意，它astype有一个默认的可选转换参数，unsafe因此您可能需要指定casting='safe'以避免在将浮点数转换为整数时意外丢失精度：

r.astype( [ (d[0], new_dtype) if d[1] in old_dtype else d for d in r.dtype.descr ],
          casting='safe' )

有关更多信息和其他选项，请参阅有关 astype 的 numpy 文档。casting

另请注意，对于将浮点数更改为整数或反之亦然的一般情况，您可能更愿意检查一般数字类型，np.issubdtype而不是检查多个特定的 dtype。

python - 如何更改 numpy recarray 某些列的 dtype？

3 回答 3

Related

Reference