r - 基于不同 R 包中的 GPD 计算回报水平

Question

我正在对气象数据进行极值分析，以精确计算以 mm/d 为单位的降水数据。我正在使用阈值过量方法来估计具有最大似然法的广义帕累托分布的参数。

目的是计算每日降水的几个回归水平（即 2、5、10、20、50、100 年事件）。

虽然 R 代码工作正常，但我想知道为什么在根据具有不同包的拟合 GPD 的分位数计算返回水平时，我会得到明显不同的结果。尽管 GPD 的估计参数在每个包中几乎相同，但分位数差异很大。

我使用的包是：ismev、extRemes、evir 和 POT。

我猜对 GPD 参数的不同估计是由于不同的计算例程造成的，但我不明白为什么分位数的计算因不同的包而有很大差异。

虽然 lmom、evir 和 POT 返回相同的分位数值，但从 extRemes 包派生的返回周期与其他结果不同。

# packages
library(ismev)
library(extRemes)
library(evir)
library(POT)
library(lmom)

th <- 50

# sample data:
potvalues <- c(
  58.5,44.2,49.6,59.3,48.3,60.9,94.5,47.1,45.3,57.6,48.2,46.2,44.2,50.6,42.1,52.7,80.9,
  58.5,51.3,48.4,51.7,71.9,60.1,64.4,43.5,55.5,49.3,58.2,47.5,43.7,45.2,52.8,42.2,46.4,
  96.1,47.5,50.1,42.4,60.9,72.6,51.6,59.4,80.5,63.7,59.9,45.0,66.7,47.6,53.3,43.1,51.0,
  46.2,53.6,59.8,51.7,46.7,42.6,44.5,45.0,50.0,44.0,89.9,44.2,47.8,53.3,43.0,55.7,44.6,
  44.6,54.9,45.1,43.9,78.7,45.5,64.0,42.7,47.4,57.0,105.4,64.3,43.2,50.4,80.2,49.9,71.6,
  47.4,44.1,47.6,55.2,44.4,78.6,50.8,42.4,47.1,43.5,51.4)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package extRemes

# fit gpd
pot.ext <- fevd(potvalues, method = "MLE", type="GP", threshold=th)

# return levels:
rl.extremes <-  return.level(pot.ext, conf = 0.05,
                             return.period= c(2,5,10,20,50,100))
rl.extremes <- as.numeric(rl.extremes)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package ismev

pot.gpd <- gpd.fit(potvalues, threshold=th)

s1 <- quagpa(f=.99, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) # 100
s2 <- quagpa(f=.98, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #  50
s3 <- quagpa(f=.95, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #  20
s4 <- quagpa(f=.90, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #  10
s5 <- quagpa(f=.80, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #   5
s6 <- quagpa(f=.50, para=c(pot.gpd$threshold, pot.gpd$mle[1],-pot.gpd$mle[2])) #   2

rl.ismev <- c(s6, s5, s4, s3, s2, s1)

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package evir

# fit gpd
gpd.evir <- gpd(potvalues, threshold=th)

# plot
evirplot <- plot(gpd.evir)
1 # Excess Distribution
0 # exit

x100 <- gpd.q(pp=.99, x=evirplot) # 100
x050 <- gpd.q(pp=.98, x=evirplot) #  50
x020 <- gpd.q(pp=.95, x=evirplot) #  20
x010 <- gpd.q(pp=.90, x=evirplot) #  10
x005 <- gpd.q(pp=.80, x=evirplot) #   5
x002 <- gpd.q(pp=.50, x=evirplot) #   2

rl.evir <- t(rbind(x002,x005,x010,x020,x050,x100))
rl.evir <- as.numeric(rl.evir[2,])

#------------------------------------------------------------------------------------------#

# MLE Fitting of GPD - package POT

gpd.pot <- fitgpd(potvalues, threshold=th)
quant = c(0.50, 0.80, 0.90, 0.95, 0.98, 0.99)
rtp <- c(2,5,10,20,50,100)

retvec <- vector()
for (i in quant){
  x <- POT::qgpd(i, loc = th, scale = as.numeric(gpd.pot$param[1]),
            shape = as.numeric(gpd.pot$param[2]))
  retvec <- c(retvec,x)
}

rl.pot <- retvec

#------------------------------------------------------------------------------------------#
# comparison of results - return periods
result <- cbind(rl.extremes,rl.ismev, rl.evir, rl.pot)
round(result, 2)

#------------------------------------------------------------------------------------------#
# comparison of estimated parameters
param.extremes <- pot.ext$results$par # extremes
param.ismev <- pot.gpd$mle # ismev
param.evir <- c(gpd.evir$par.ests[2],gpd.evir$par.ests[1])  # evir
param.pot <- gpd.pot$param # POT

parameters <- cbind(param.extremes, param.ismev , param.evir, param.pot)
round(parameters, 4)

#------------------------------------------------------------------------------------------#

score 3 · Accepted Answer

例如，在 Coles 的书（极值统计建模简介，第 4.3.3 章）中描述了该问题的解决方案。虽然 GEV 的回报水平可以直接从其分位数推导出来，但在计算一个 GEV 的回报水平时，必须考虑所谓的超额率（即每年的事件数量或事件超过阈值的可能性）。 GP 范围内的峰值超过阈值方法。

N 年回报水平定义为

n 年回报水平

因此，在不考虑超出率的情况下简单地计算 GP 分布的分位数时，无法获得有意义的回报水平结果。extRemes 包考虑了超出率，而 POT 和 evir 包中每年事件数的默认值设置为 1（如果未指定）。

score 1 · Accepted Answer

差异也可能来自于将分布函数拟合到数据集的不同方法。我在 CRAN 上有一个包，它比较了几个 R 包和方法的 GPD 拟合（或者更确切地说，它们的分位数估计）：

https://cran.r-project.org/web/packages/extremeStat/vignettes/extremeStat.html

您还可以使用该软件包将 GPD 与其他发行版进行比较。

r - 基于不同 R 包中的 GPD 计算回报水平

2 回答 2

Related

Reference