r - R中截断帕累托的形状估计

Question

library(VGAM)
library(fitdistrplus)

fitdist(u_NI$k_u, 'truncpareto',
         start = list(lower=1,
                      upper=42016,
                      shape=1)) -> fit.k_u

length(u_NI$k_u) = 637594

我收到了这个错误：

<simpleError in optim(par = vstart, fn = fnobj, fix.arg = fix.arg, obs = data,     gr = gradient, ddistnam = ddistname, hessian = TRUE, method = meth,     lower = lower, upper = upper, ...): function cannot be evaluated at initial parameters>
Error in fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  : 
  the function mle failed to estimate the parameters, 
                with the error code 100
In addition: Warning messages:
1: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  :
  The dtruncpareto function should return a zero-length vector when input has length zero
2: In fitdist(u_NI$k_u, "truncpareto", start = list(lower = 1, upper = 42016,  :
  The ptruncpareto function should return a zero-length vector when input has length zero

问题出在数据集过大还是起始参数上？

可重复的例子：

library(VGAM)
library(fitdistrplus)

rtruncpareto(100,1,100,1.5) -> a
fitdist(a, "truncpareto",
        start = list(lower=1,
                     upper=100,
                     shape=1.5))

这是行不通的，我不明白为什么。

这里似乎有问题：

argument 'lower' must be positive

score 2 · Accepted Answer

您的问题的一部分是一个更普遍的问题，即对于截断分布，边界参数的 MLE 通常等于观察到的数据集的最小值/最大值。因此，您应该始终能够通过将下限/上限的值设置为等于最小值/最大值来做到这一点（根据我的经验，它们必须略低于/高于观察到的界限）。（我还发现我必须设置lower = 0阻止算法尝试形状参数的负值。）

library(VGAM); library(fitdistrplus)
set.seed(101)
rtruncpareto(100,1,100,1.5) -> a
eps <- 1e-8
fitdist(a, "truncpareto",
        start = list(shape=1.5),
        fix.arg = list(lower = min(a) - eps, upper = max(a) + eps),
        lower = 0)
Parameters:
      estimate Std. Error
shape 1.349885  0.1554436
Fixed parameters:
          value
lower  1.006844
upper 25.906577

一个替代方案fitdist是bbmle：

library(bbmle)
m1 <- mle2(a ~ dtruncpareto(lower = min(a) - eps,
                            upper = max(a) + eps,
                            shape = exp(lshape)),
           start = list(lshape = 0),
           data = data.frame(a),
         method = "BFGS")
exp(coef(m1))   
  lshape 
1.349884

bbmle更灵活一点，允许您在对数尺度上拟合形状参数，这通常更稳健（并使标准偏差估计更有用）。在这里使用method = "BFGS"是因为默认的 Nelder-Mead 算法对于一维优化效果不佳；也可以使用method = "Brent"（这将更有效和更健壮），但随后需要为lshape参数提供明确的下限和上限。

score 2 · Accepted Answer

我不太确定，但这可能是你的追求。

library(fitdistrplus)
library(VGAM)
library(ggplot2)

u_NI <- data.frame("k_u" = rtruncpareto(10000,
                                        lower = 1,
                                        upper = 100,
                                        shape = 1.5))

fit <- vglm(k_u ~ 1,
            truncpareto(lower=1, upper=max(u_NI$k_u) + 1),
            data = u_NI,
            trace = TRUE)

x <- seq(1, max(u_NI$k_u), length.out = 10000)
y <- dtruncpareto(x, shape = fit@coefficients[["(Intercept)"]],
                  lower=fit@extra$lower,
                  upper=fit@extra$upper)
pareto <- cbind.data.frame(x, y)

ggplot()+
  geom_density(data = u_NI, aes(k_u))+
  geom_line(data = pareto, aes(x = x, y = y, color = "truncpareto"),linetype = 5, size = 1.3)+
  theme(legend.position = c(.95, .95),
        legend.justification = c(1, 1),
        legend.title = element_blank())

＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃＃编辑：

# fix lower and upper boundary, estimate may still fail, depending on     the start value of shape
fitdist(u_NI$k_u, 'truncpareto',
        method = "mle",
        start = list(shape=1),
        fix.arg=list(lower=1, upper=max(u_NI$k_u) + 0.01),
        control=list(trace=1, REPORT=1)) -> fit.k_u

# use different method to estimate parameter, mge seems to work
fitdist(u_NI$k_u, 'truncpareto',
        method = "mge",
        start = list(shape=1,
                     lower=1,
                     upper=max(u_NI$k_u) + 1),
        control=list(trace=1, REPORT=1)) -> fit.k_u

剩下的错误是指这个：

> pnorm(numeric(), 1,10)
numeric(0)
> ptruncpareto(numeric(), 1,10,5)
[1] NA

####### Edit2：我想我发现了原来的错误，它有点令人困惑。但是，mle 还有一个较低的参数，它与分布的较低参数分开。它应该将参数估计限制为 >= 0。因此，这应该可以工作，但是需要很长时间，即使只有 10,000 个值：

fitdist(u_NI$k_u, 'truncpareto',
        method = "mle",
        start = list(shape=1,
                     lower=1,
                     upper=max(u_NI$k_u) +1 ),
        control=list(trace=1, REPORT=1),
        lower = 0) -> fit.k_u

r - R中截断帕累托的形状估计

2 回答 2

Related

Reference