1

Numpy 总结了 large arrays,在交互式会话中工作时很方便。不幸的是,结构化arrays并且recarrays默认情况下没有很好地总结。有没有办法改变这个?

默认情况下,array如果有 1000 个或更少的项目,则会显示完整。当有比这更多的项目时,将array被汇总。这可以用 设置np.set_printoptions(threshold=<number of items to trigger summarization>, edgeitems=<number of items to show in summary>)。这适用于标准数据类型,例如:

np.set_printoptions(threshold=3, edgeitems=1)
print(np.zeros(3))
print(np.zeros(4))

结果是

[ 0.  0.  0.]
[ 0. ...,  0.]

然而,当使用更复杂的数据类型时,总结的帮助就更少了

print(np.zeros(4, dtype=[('test', 'i4', 3)]))
print(np.zeros(4, dtype=[('test', 'i4', 4)]))

[([0, 0, 0],) ..., ([0, 0, 0],)]
[([0, 0, 0, 0],) ..., ([0, 0, 0, 0],)]

数组已汇总,但子数据类型未汇总。这成为arrays使用复杂数据类型的大问题。例如array np.zeros(1000, dtype=[('a', float, 3000), ('b', float, 10000)])挂断我的 ipython 实例。

有几种解决方法,而不是np.array直接使用类型,可以继承并编写自定义__repr__. 这适用于大型项目,但不能解决根本问题,也不便于在交互式 python 会话中快速探索数据。我还在我的编辑器中实现了一个自定义过滤器,它会截断很长的控制台输出。这有点小技巧,当我在其他地方启动 python 会话时没有帮助。

是否有我不知道的 numpy 设置,或者可以解决此问题的 python 或 ipython 设置?

4

1 回答 1

0

这是我想出的一种解决方法,可以合理地打印记录数组。

def count_dtype(dtype):
    """
    dtype : datatype descr (list of strings / tuples, subdtypes rather than dtype object)
    Return total number of elements in array of dtype
    """
    sum = 0
    for name, t, *shape in dtype:
        if isinstance(t, str): ## base datatype
            if shape:
                sum += np.multiply.reduce(shape[0], dtype=np.int64)
            else:
                sum += 1
        else: ## Subarray type
            sum += np.multiply.reduce(shape, dtype=np.int64)*count_dtype(t)
    return sum


def _recarray2string(a, options, separator=' ', prefix=""):
    """
    Create a string representation of a record array
    a : record array
    separator : used by _array2string
    prefix : used by _array2string
    """
    options = np.get_printoptions()
    threshold = options['threshold']
    edgeitems = options['edgeitems']

    size = count_dtype(a.dtype.descr)
    items = np.multiply.reduce(a.shape)
    if size*items > threshold/(2*edgeitems): ## Too big
        if size > threshold: ## subtype is too large - how to handle?
            newopt = options.copy()
            newopt['threshold'] = options['threshold'] // (2*options['edgeitems'])
            def fmt_subtype(r):
                res = []
                for sub in r:
                    if sub.dtype.names is not None:
                        res.append(fmt_subtype(sub))
                    else:
                        res.append(_array2string(sub, newopt, separator=separator, prefix=prefix))
                return separator.join(res)
            return separator.join(fmt_subtype(a[i]) for i in range(edgeitems)) + '\n...\n' + \
                   separator.join(fmt_subtype(a[a.shape[0]-i-1]) for i in range(edgeitems))
        else: ## Subtype is small enough it's sufficient to truncate only sub-dtype
            options = options.copy()
            options['threshold'] = threshold // size
            return _array2string_old(a, options, separator=separator, prefix=prefix)
    else:  ## Print as normal
        return _array2string_old(a, options, separator=separator, prefix=prefix)


def _array2string(a, options , separator=' ', prefix=""):
    """
    monkeypatched print function that allows truncating record arrays sensibly
    """
    if a.dtype.names is not None:
        return  _recarray2string(a, options, separator=separator, prefix=prefix)
    else:
        return _array2string_old(a, options, separator=separator, prefix=prefix)


# Setup monkeypatching
_array2string_old = np.core.arrayprint._array2string
np.core.arrayprint._array2string = _array2string
于 2018-08-31T15:46:50.767 回答