2
from numpy import percentile
import numpy as np
data=np.array([1,2,3,4,5,6,7,8,9,10])
# calculate quartiles
quartile_1 = percentile(data, 25)
quartile_3 =percentile(data, 75)
# calculate min/max

print(quartile_1) # show 3.25
print(quartile_3) # shows 7.75

你能解释一下如何计算 3.25 和 7.75 的值吗?我预计他们是 3 和 8。

4

5 回答 5

2

手动逐步计算 Numpy 百分位数:

第 1 步:查找长度

x = [1,2,3,4,5,6,7,8,9,10]
l = len(x) 
# Output --> 10

第 2 步:减去1以获得从第一个项目到最后一个项目的距离x

# n = (length - 1) 
# n = (10-1) 
# Output --> 9

第 3 步:乘以n分位数,这里是第 25 个百分位数或 0.25 个分位数或第一个四分位数

n * 0.25
# Therefore, (9 * 0.25) 
# Output --> 2.25
# So, fraction is 0.25 part of 2.25
# m = 0.25

Step-4 : 现在得到最终答案

对于线性:

# i + (j - i) * m
# Here, think i and j as values at indices
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, for '2.25':
# value at index immediately before 2.25, is at index=2 so, i=3
# value at index immediately after 2.25, is at index=3 so, i=4
# and fractions 
3 + (4 - 3)*0.25
# Output --> 3.25

对于较低

# Here, based on output from Step-3
# Because, it is '2.25', 
# Find a number a index lower than 2.25
# So, lower index is '2'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=2 we have '3' 
# Output --> 3

对于更高

# Here, based on output from Step-3
# Because, it is '2.25', 
# Find a number a index higher than 2.25
# So, higher index is '3'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=3 we have '4' 
# Output --> 4

对于最近的

# Here, based on output from Step-3
# Because, it is '2.25', 
# Find a number a index nearest to 2.25
# So, nearest index is '2'
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, at index=2 we have '3' 
# Output --> 3

对于中点

# Here, based on output from Step-3
# (i + j)/2
# Here, think i and j as values at indices
# x = [1,2,3,4,5,6,7,8,9,10]
#idx= [0,1,2,3,.........,9]
# So, for '2.25'
# value at index immediately before 2.25, is at index=2 so, i=3
# value at index immediately after 2.25, is at index=3 so, i=4
(3+4)/2
# Output --> 3.5

Python中的代码:

x = np.array([1,2,3,4,5,6,7,8,9,10])
print("linear:", np.percentile(x, 25, interpolation='linear'))
print("lower:", np.percentile(x, 25, interpolation='lower'))
print("higher:", np.percentile(x, 25, interpolation='higher'))
print("nearest:", np.percentile(x, 25, interpolation='nearest'))
print("midpoint:", np.percentile(x, 25, interpolation='midpoint'))

输出:

linear: 3.25
lower: 3
higher: 4
nearest: 3
midpoint: 3.5
于 2020-09-04T09:41:45.503 回答
1
于 2019-11-28T10:59:18.870 回答
1

虽然这可能是一个插值问题,但通过某些四分位数方法(即方法 2),答案应该是准确的 [3, 8]

根据我在此处此处的回答,numpy请改用方法 3。

不幸的是,在统计领域提出四分位数的统一定义之前,混乱将继续存在。

于 2019-11-28T11:13:13.263 回答
0

There are various options that can be used depending on the type of interpolation method that you want the percentile to be calculated at.

a = np.arange(1, 11)
a  # array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10])

np.percentile(a, (25, 75), interpolation='midpoint') # array([3.5, 7.5])
np.percentile(a, (25, 75), interpolation='nearest')  # array([3, 8])
np.percentile(a, (25, 75), interpolation='nearest')  # array([3, 8])
np.percentile(a, (25, 75), interpolation='linear')   # array([3.25, 7.75])
np.percentile(a, (25, 75), interpolation='lower')    # array([3, 7])
np.percentile(a, (25, 75), interpolation='higher')   # array([4, 8])

You will note that the cumulative relative frequency is what the percentiles need to be derived from

c = np.cumsum(a)
c  # ---- array([ 1,  3,  6, 10, 15, 21, 28, 36, 45, 55], dtype=int32)
c/c[-1] * 100
array([  1.81818182,   5.45454545,  10.90909091,  18.18181818,
        27.27272727,  38.18181818,  50.90909091,  65.45454545,
        81.81818182, 100.        ])

and percentiles for 25 and 75 will require an interpolation of some form.

于 2019-11-28T10:59:31.953 回答
0

numpy 文档

给定一个长度为 N 的向量 V,V 的第 q 个百分位数是 V 的排序副本中从最小值到最大值的值 q/100。两个最近邻居的值和距离以及如果归一化排名与 q 的位置不完全匹配,插值参数将确定百分位数。如果 q=50,此函数与中位数相同,如果 q=0,则与最小值相同,如果 q=100,则与最大值相同。

所以问题在于当找不到与您的分位数完全匹配时 numpy 的反应。如果你使用interpolation="nearest",你会得到你期望得到的结果:

>>> from numpy import percentile
>>> import numpy as np
>>> data=np.array([1,2,3,4,5,6,7,8,9,10])
>>> # calculate quartiles
... quartile_1 = percentile(data, 25, interpolation="nearest")
>>> quartile_3 = percentile(data, 75, interpolation="nearest")
>>> print(quartile_1) 
3
>>> print(quartile_3) 
8
于 2019-11-28T10:57:02.450 回答