t
是一个 dask 数组。我想绘制一个直方图t
。Dask 文档有方法
dask.array.histogram(a, bins=None, range=None, normed=False, weights=None, density=None)
但没有例子。我尝试bins
使用 numpy 数组进行设置。没用。我试过使用matplotlib.pyplot
它花了超过 5 分钟并且没有产生任何东西(我的数据集非常大(GB 大小),但这似乎很长一段时间)。
Dask.array.histogram 需要bins
和range
分别设置所需的 bin 数量和数据的最小/最大范围。这是一个简单的例子:
In [1]: import dask.array as da
In [2]: x = da.random.normal(10, 0.1, size=(100000,), chunks=(1000,)) # random dataset
In [3]: h, bins = da.histogram(x, bins=100, range=[9, 11])
In [4]: bins
Out[4]:
array([ 9. , 9.02, 9.04, 9.06, 9.08, 9.1 , 9.12, 9.14,
9.16, 9.18, 9.2 , 9.22, 9.24, 9.26, 9.28, 9.3 ,
9.32, 9.34, 9.36, 9.38, 9.4 , 9.42, 9.44, 9.46,
9.48, 9.5 , 9.52, 9.54, 9.56, 9.58, 9.6 , 9.62,
9.64, 9.66, 9.68, 9.7 , 9.72, 9.74, 9.76, 9.78,
9.8 , 9.82, 9.84, 9.86, 9.88, 9.9 , 9.92, 9.94,
9.96, 9.98, 10. , 10.02, 10.04, 10.06, 10.08, 10.1 ,
10.12, 10.14, 10.16, 10.18, 10.2 , 10.22, 10.24, 10.26,
10.28, 10.3 , 10.32, 10.34, 10.36, 10.38, 10.4 , 10.42,
10.44, 10.46, 10.48, 10.5 , 10.52, 10.54, 10.56, 10.58,
10.6 , 10.62, 10.64, 10.66, 10.68, 10.7 , 10.72, 10.74,
10.76, 10.78, 10.8 , 10.82, 10.84, 10.86, 10.88, 10.9 ,
10.92, 10.94, 10.96, 10.98, 11. ])
In [5]: h.compute()
Out[5]:
array([ 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1, 1, 4, 15,
19, 71, 132, 231, 376, 604, 891, 1307, 1884, 2635, 3422,
4276, 5455, 6158, 7092, 7759, 7933, 7994, 7625, 6994, 6194, 5315,
4272, 3381, 2529, 1803, 1324, 912, 594, 331, 225, 127, 54,
32, 12, 10, 2, 2, 1, 1, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0])