python - matplotlib：绘制 CDF 的方法

Question

在python中matplotlib，我必须在同一个图上绘制2 条 CDF 曲线：一条用于数据 A，一条用于数据 B。

如果我自己决定“分箱”，我将执行以下操作并根据数据 A 获取 100 个直方图。（在我的情况下，A 始终最多为 B 大小的 50%）

import numpy as np
import matplotlib

fig = plt.figure()
ax = fig.add_subplot(1, 1, 1)
ax.grid(True)

a = 0
nhist = 100                
b = np.max(samplesFromA)
c = b-a
d = float(c) / float(nhist)  #size of each bin
# tmp will contain a list of bins:  [a, a+d, a+2*d, a+3*d, ... b]
tmp = [a]
for i in range(nhist):
    if i == a:
    continue
    else:
    tmp.append(tmp[i-1] + d)

#  CDF of A 
ax.hist(samplesFromA, bins=tmp, cumulative=True, normed=True,
        color='red', histtype='step', linewidth=2.0,
        label='samples A')

# CDF of B
plt.hist(samplesFromB, bins=tmp, cumulative=True, normed=True,
        color='blue', alpha=0.5, histtype='step', linewidth=1.0,
        label='samples B')

这是结果（我剪掉了所有不相关的信息）：在此处输入图像描述

最近我发现了关于sm.distributions.ECDF，我想将其与我之前的实现进行比较。基本上，我只会在我的数据上调用以下函数（并在其他地方决定最右边的 bin 的范围），而不计算任何 bin：

def drawCDF(ax, aSample):
    ecdf = sm.distributions.ECDF(aSample)
    x = np.linspace(min(aSample), max(aSample))
    y = ecdf(x)
    ax.step(x, y)
    return ax

这是使用相同数据的结果（同样，我手动裁剪了不相关的文本）：在此处输入图像描述

事实证明，最后一个示例将太多的 bin 合并在一起，结果不是一个非常细粒度的 CDF 曲线。这里的幕后究竟发生了什么？

样本 A（红色）包含70 个样本，而样本 B（蓝色）包含15 000 个！

score 1 · Accepted Answer

我建议你阅读源代码。

如果你想要均匀间隔的垃圾箱：

x = np.linspace(min(aSample), 
                max(aSample),
                int((max(aSample) - min(aSample)) / step))

np.arange文档

python - matplotlib：绘制 CDF 的方法

1 回答 1

Related

Reference