python - 熊猫如何计算四分位数？

Question

我有一个非常简单的数据框：

df = pd.DataFrame([5,7,10,15,19,21,21,22,22,23,23,23,23,23,24,24,24,24,25], columns=['val'])

df.median() = 23 这是正确的，因为从列表中的 19 个值中，23 是第 10 个值（23 之前的 9 个值，23 之后的 9 个值）

我试图将第一和第三四分位数计算为：

df.quantile([.25, .75])

         val
0.25    20.0
0.75    23.5

我原本预计，从低于中位数的 9 个值来看，第一个四分位数应该是 19，但正如您在上面看到的，python 说它是 20。同样，对于第三个四分位数，从右到左的第五个数字是 24，但 python 显示为 23.5。

熊猫如何计算四分位数？

原始问题来自以下链接： https ://www.khanacademy.org/math/statistics-probability/summarizing-quantitative-data/box-whisker-plots/a/identifying-outliers-iqr-rule

score 2 · Accepted Answer

Python 不会创建分位数，Pandas 会。这里看看文档 https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.quantile.html 它实际上使用了numpy的百分位数函数 https://docs.scipy.org/ doc/numpy/reference/generated/numpy.percentile.html#numpy.percentile

score 2 · Accepted Answer

它默认使用线性插值。以下是使用最近的方法：

df['val'].quantile([0.25, 0.75], interpolation='nearest')

Out:
0.25    19
0.75    24

官方文档中有关interpolation参数如何工作的更多信息：

    This optional parameter specifies the interpolation method to use,
    when the desired quantile lies between two data points `i` and `j`:

    * linear: `i + (j - i) * fraction`, where `fraction` is the
      fractional part of the index surrounded by `i` and `j`.
    * lower: `i`.
    * higher: `j`.
    * nearest: `i` or `j` whichever is nearest.
    * midpoint: (`i` + `j`) / 2.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.quantile.html

python - 熊猫如何计算四分位数？

2 回答 2

Related

Reference