1

给定一个像这样包含任意数据的 numpy 数组:

>>> data
array([  1,   172,   32, ..., 42, 189, 29], dtype=int8) # SIGNED int8

...我需要构造一个 numpy 数组“结果”,如下所示:

(请原谅伪代码实现。如果我知道该怎么做,我不会问。如果我有一个有效的 numpy 实现,我会将我的问题直接提交给 CodeReview。)

for value in data, check:
    if value & 0x01:
        result.append((value >> 1 << 8) + next(value).astype(numpy.uint8))
        # that is: take TWO values from 'data', one signed, the next un-signed, glue them together, appending ONE int16 to result
    else:
        result.append(value >> 1)
        # that is: take ONE value from 'data', appending ONE int8 to result

我已经在“普通”Python 中实现了这一点。它工作得很好,但希望可以使用 numpy 及其非常有效的数组操作进行优化。我想摆脱列表和附加。可悲的是,我不知道如何完成它:

# data is a string of 'bytes' received from a device
def unpack(data):
    l = len(data)
    p = 0
    result = []

    while p < l:
        i1 = (((ord(data[p]) + 128) % 256) - 128)
        p += 1
        if i1 & 0x01:
            # read next 'char' as an uint8
            #
            # due to the nature of the protocol,
            # we will always have sufficient data
            # available to avoid reading past the end
            i2 = ord(data[p])
            p += 1
            result.append((i1 >> 1 << 8) + i2)
        else:
            result.append(i1 >> 1)

    return result

更新:感谢@Jaime,我设法实现了一个高效的解包功能。它与他的非常相似,尽管速度更快。while 循环当然是关键部分。我把它贴在这里以防有人感兴趣:

def new_np_unpack(data):
    mask = (data & 0x01).astype(numpy.bool)

    true_positives = None

    while True:
        # check for 'true positives' in the tentative mask
        # the next item must by definition be a false one
        true_positives = numpy.nonzero(numpy.logical_and(mask, numpy.invert(numpy.concatenate(([False], mask[:-1])))))[0]

        # loop until no more 'false positives'
        if not numpy.any(mask[true_positives+1]):
            break

        mask[true_positives+1] = False

    result = numpy.empty(data.shape, dtype='int16')
    result[:] = data.astype('int8') >> 1
    result[true_positives] = (result[true_positives] << 8) + data[true_positives + 1]
    mask = numpy.ones(data.shape, dtype=bool)
    mask[true_positives + 1] = False
    return result[mask]
4

1 回答 1

1

我得到了一些矢量化的工作。为了比较,我取出ord(...)了你的代码,并给它提供了如下数据:

data = np.random.randint(256, size=(1000000,)).astype('uint8')
data[-1] = 0 # to avoid errors with last element

我的功能版本:

def np_unpack(data) :
    # find where condition is met
    mask = (data & 0x01).astype(bool)
    # Keep only 1st, 3rd, 5th... consecutive occurrences of True in mask
    new_mask = mask[:]
    mult = -1
    while new_mask.sum() :
        new_mask = np.logical_and(new_mask,
                                  np.concatenate(([False], new_mask[:-1])))
        mask +=  new_mask * mult
        mult *= -1
    del new_mask
    cond = np.nonzero(mask)[0]
    result = np.empty(data.shape, dtype='int16')
    result[:] = data.astype('int8') >> 1
    result[cond] <<= 8
    result[cond] += data[cond + 1]
    mask = np.ones(data.shape, dtype=bool)
    mask[cond + 1] = False
    return result[mask]

还有一些使用 1M 元素列表的测试:

In [4]: np.all(unpack(data) == np_unpack(data))
Out[4]: True

In [5]: %timeit unpack(data)
1 loops, best of 3: 7.11 s per loop

In [6]: %timeit np_unpack(data)
1 loops, best of 3: 294 ms per loop
于 2013-02-15T01:25:18.863 回答