python - matplotlib 中累积分布函数的对数图

Question

我有一个包含记录事件的文件。每个条目都有时间和延迟。我对绘制延迟的累积分布函数很感兴趣。我对尾部延迟最感兴趣，所以我希望绘图具有对数 y 轴。我对以下百分位数的延迟感兴趣：第 90、第 99、第 99.9、第 99.99 和第 99.999。到目前为止，这是我生成常规 CDF 图的代码：

# retrieve event times and latencies from the file
times, latencies = read_in_data_from_file('myfile.csv')
# compute the CDF
cdfx = numpy.sort(latencies)
cdfy = numpy.linspace(1 / len(latencies), 1.0, len(latencies))
# plot the CDF
plt.plot(cdfx, cdfy)
plt.show()

正则 CDF 图

我知道我想要情节是什么样子，但我一直在努力实现它。我希望它看起来像这样（我没有生成这个图）：

对数 CDF 图

使 x 轴对数很简单。y轴是给我带来问题的轴。Usingset_yscale('log')不起作用，因为它想使用 10 的幂。我真的希望 y 轴具有与此图相同的刻度标签。

我怎样才能把我的数据变成这样的对数图？

编辑：

如果我将 yscale 设置为 'log'，并将 ylim 设置为 [0.1, 1]，我会得到以下图：

在此处输入图像描述

问题在于，在从 0 到 1 的数据集上的典型对数刻度图将集中在接近零的值上。相反，我想关注接近 1 的值。

score 17 · Accepted Answer

本质上，您需要将以下转换应用于您的Y值：-log10(1-y). 这施加了唯一的限制y < 1，因此您应该能够在转换后的图上具有负值。

这是文档中的一个修改示例matplotlib，显示了如何将自定义转换合并到“比例”中：

import numpy as np
from numpy import ma
from matplotlib import scale as mscale
from matplotlib import transforms as mtransforms
from matplotlib.ticker import FixedFormatter, FixedLocator


class CloseToOne(mscale.ScaleBase):
    name = 'close_to_one'

    def __init__(self, axis, **kwargs):
        mscale.ScaleBase.__init__(self)
        self.nines = kwargs.get('nines', 5)

    def get_transform(self):
        return self.Transform(self.nines)

    def set_default_locators_and_formatters(self, axis):
        axis.set_major_locator(FixedLocator(
                np.array([1-10**(-k) for k in range(1+self.nines)])))
        axis.set_major_formatter(FixedFormatter(
                [str(1-10**(-k)) for k in range(1+self.nines)]))


    def limit_range_for_scale(self, vmin, vmax, minpos):
        return vmin, min(1 - 10**(-self.nines), vmax)

    class Transform(mtransforms.Transform):
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, nines):
            mtransforms.Transform.__init__(self)
            self.nines = nines

        def transform_non_affine(self, a):
            masked = ma.masked_where(a > 1-10**(-1-self.nines), a)
            if masked.mask.any():
                return -ma.log10(1-a)
            else:
                return -np.log10(1-a)

        def inverted(self):
            return CloseToOne.InvertedTransform(self.nines)

    class InvertedTransform(mtransforms.Transform):
        input_dims = 1
        output_dims = 1
        is_separable = True

        def __init__(self, nines):
            mtransforms.Transform.__init__(self)
            self.nines = nines

        def transform_non_affine(self, a):
            return 1. - 10**(-a)

        def inverted(self):
            return CloseToOne.Transform(self.nines)

mscale.register_scale(CloseToOne)

if __name__ == '__main__':
    import pylab
    pylab.figure(figsize=(20, 9))
    t = np.arange(-0.5, 1, 0.00001)
    pylab.subplot(121)
    pylab.plot(t)
    pylab.subplot(122)
    pylab.plot(t)
    pylab.yscale('close_to_one')

    pylab.grid(True)
    pylab.show()

请注意，您可以通过关键字参数控制 9 的数量：

pylab.figure()
pylab.plot(t)
pylab.yscale('close_to_one', nines=3)
pylab.grid(True)

score 1 · Accepted Answer

好的，这不是最干净的代码，但我看不到解决方法。也许我真正要求的不是对数 CDF，但我会等待统计学家告诉我。无论如何，这是我想出的：

# retrieve event times and latencies from the file
times, latencies = read_in_data_from_file('myfile.csv')
cdfx = numpy.sort(latencies)
cdfy = numpy.linspace(1 / len(latencies), 1.0, len(latencies))

# find the logarithmic CDF and ylabels
logcdfy = [-math.log10(1.0 - (float(idx) / len(latencies)))
           for idx in range(len(latencies))]
labels = ['', '90', '99', '99.9', '99.99', '99.999', '99.9999', '99.99999']
labels = labels[0:math.ceil(max(logcdfy))+1]

# plot the logarithmic CDF
fig = plt.figure()
axes = fig.add_subplot(1, 1, 1)
axes.scatter(cdfx, logcdfy, s=4, linewidths=0)
axes.set_xlim(min(latencies), max(latencies) * 1.01)
axes.set_ylim(0, math.ceil(max(logcdfy)))
axes.set_yticklabels(labels)
plt.show()

混乱的部分是我更改 yticklabels 的地方。该logcdfy变量将保存 0 到 10 之间的值，在我的示例中它介于 0 到 6 之间。在此代码中，我将标签与百分位数交换。该plot函数也可以使用，但我喜欢该scatter函数在尾部显示异常值的方式。此外，我选择不在对数刻度上制作 x 轴，因为没有它，我的特定数据具有良好的线性线。

在此处输入图像描述

python - matplotlib 中累积分布函数的对数图

2 回答 2

Related

Reference