python - 如何 hstack numpy 记录的数组？

Question

_{[这篇文章的早期版本的标题不准确“如何将一列添加到 numpy 记录数组中？” 早期标题中提出的问题已经部分回答，但这个答案并不是这篇文章早期版本的正文所要求的。我已经改写了标题，并对帖子进行了大量编辑，以使区别更加清晰。我还解释了为什么我前面提到的答案不符合我的要求。]}

假设我有两个numpy数组x和y，每个数组由r个“记录”（又名“结构化”）数组组成。设xbe ( r , c _x ) 的形状和ybe ( r , c _y ) 的形状。我们还假设x.dtype.names和之间没有重叠y.dtype.names。

例如，对于r = 2、c _x = 2 和c _y = 1：

import numpy as np
x = np.array(zip((1, 2), (3., 4.)), dtype=[('i', 'i4'), ('f', 'f4')])
y = np.array(zip(('a', 'b')), dtype=[('s', 'a10')])

我想“水平”连接x并y产生一个新的记录数组z，具有形状（r，c _x + c _y）。此操作不应该修改x或根本不应该修改y。

一般来说，z = np.hstack((x, y))不会这样做，因为dtype' 在x并且y不一定匹配。例如，继续上面的例子：

z = np.hstack((x, y))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-7-def477e6c8bf> in <module>()
----> 1 z = np.hstack((x, y))
TypeError: invalid type promotion

现在，有一个函数，numpy.lib.recfunctions.append_fields看起来它可能会做一些接近我正在寻找的事情，但我无法从中得到任何东西：我尝试过的所有事情要么因错误而失败，要么产生不同于我想要得到的东西。

Can someone please show me explicitly the code (using n.l.r.append_fields or otherwise¹) that would generate, from the x and y defined in the example above, a new array of records, z, equivalent to the horizontal concatenation of x and y, and do so without modifying either x or y?

I assume that this will require only one or two lines of code. Of course, I am looking for code that does not require building z, record by record, by iterating over x and y. Also, the code may assume that x and y have the same number of records, and that there is no overlap between x.dtype.names and y.dtype.names. Other than this, the code I'm looking for should know nothing about x and y. Ideally, it should be agnostic also about the number of arrays to join. IOW, leaving out error checking, the code I'm looking for could be the body of a function hstack_rec so that the new array z would be the result hstack_rec((x, y)).

¹_{...although I have to admit that, after my so-far perfect record of failure with numpy.lib.recfunctions.append_fields, I've become a bit curious about how this function could be used at all, irrespective of its relevance to this post's question.}

score 5 · Accepted Answer

I never use recarrays, and so someone else is going to come up with something slicker, but maybe merge_arrays would work?

>>> import numpy.lib.recfunctions as nlr
>>> x = np.array(zip((1, 2), (3., 4.)), dtype=[('i', 'i4'), ('f', 'f4')])
>>> y = np.array(zip(('a', 'b')), dtype=[('s', 'a10')])
>>> x
array([(1, 3.0), (2, 4.0)], 
      dtype=[('i', '<i4'), ('f', '<f4')])
>>> y
array([('a',), ('b',)], 
      dtype=[('s', '|S10')])
>>> z = nlr.merge_arrays([x, y], flatten=True)
>>> z
array([(1, 3.0, 'a'), (2, 4.0, 'b')], 
      dtype=[('i', '<i4'), ('f', '<f4'), ('s', '|S10')])

score 0 · Accepted Answer

This is a very late answer, but maybe it will be helpful to someone else. I used this solution after asking the same question with most of the same criteria.

It doesn't generate a new numpy array, but by using zip and itertools.chain it is much faster. In my case, I needed to access every value of every row in sequential order. Here is benchmark which simulates this use-case:

import numpy
from numpy.lib.recfunctions import merge_arrays
from itertools import chain

a = numpy.empty(3, [("col1", int), ("col2", float)])
b = numpy.empty(3, [("col3", int), ("col4", "U1")])

Results:

%timeit [i for i in (row for row in merge_arrays([a,b], flatten=True))]
52.9 µs ± 2 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

%timeit [i for i in (row for row in (chain(i,k) for i,k in zip(a,b)))]
3.47 µs ± 52 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

python - 如何 hstack numpy 记录的数组？

2 回答 2

Related

Reference