r - 为表格数据拟合 2 参数 weibull 分布

Question

我正在尝试将威布尔分布调整为一个表格数据。处理完我的点云后，我得到了每个 1 米高度切片的返回数列。例子：

a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23)
colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)

在我上面的例子中，中心为 13.5 米的高度等级内部有 7 个点。

如果我绘制矩阵 a 可以可视化数据分布：

barplot(a)

在此处输入图像描述

有人对如何将 weibull 2 参数拟合到该列表数据有建议吗？

提前致谢！

score 6 · Accepted Answer

您可以对删失数据进行最大似然。

a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23)
colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,
                 28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)


centers <- as.numeric(colnames(a))
low <- centers - .5
up <- centers + .5

ll.weibullCensored <- function(par, dat){
    shape <- par[1]
    scale <- par[2]
    # Get the probability for each 'bin' and take the log
    log.ps <- log(pweibull(up, shape, scale) - pweibull(low, shape, scale))
    # Sum the logs of the bin probabilities as many times
    # as they should be as dictated by the data
    sum(rep(log.ps, dat))
}

# Use optim or any other function to find a set
# of parameters that maximizes the log likelihood
o.optim <- optim(c(9, 28), 
                 ll.weibullCensored, 
                 dat = as.numeric(a), 
                 # this tells it to find max instead of a min
                 control=list(fnscale=-1))

这给出了与 AndresT 基本相同的估计，但他们的方法是假设所有数据都落在 bin 的中心，并对那个估算的数据集执行最大似然。它并没有太大的区别，但是使用这种方法，您不一定需要其他软件包。

编辑：如果我们看看我们为每种方法最大化了什么，AndresT 的解决方案和我的解决方案给出的估计值非常相似，这一事实很有意义。在我的研究中，我们正在研究落入每个“垃圾箱”的概率。AndreT 的解决方案使用 bin 中心的分布密度来替代该概率。我们可以查看每个垃圾箱落入垃圾箱的概率与垃圾箱中心的密度值的比率（使用从我的解决方案中获得的形状和比例），它给出：

# Probability of each bin
> ps
 [1] 0.0005495886 0.0009989085 0.0017438767 0.0029375471 0.0047912909
 [6] 0.0075863200 0.0116800323 0.0174991532 0.0255061344 0.0361186335
[11] 0.0495572085 0.0656015797 0.0832660955 0.1004801353 0.1139855466
[16] 0.1197890284 0.1144657811 0.0971503491 0.0711370586 0.0433654456
[21] 0.0210758647 0.0077516837 0.0020274896
# Density evaluated at the center of the bin
> ps.cent
 [1] 0.0005418957 0.0009868040 0.0017254545 0.0029103746 0.0047524364
 [6] 0.0075325510 0.0116083397 0.0174078328 0.0253967142 0.0359988789
[11] 0.0494450583 0.0655288551 0.0832789134 0.1006305707 0.1143085230
[16] 0.1202647955 0.1149865305 0.0975322358 0.0712125315 0.0431169222
[21] 0.0206762531 0.0074246320 0.0018651941
# Ratio of the probability and the density
> ps/ps.cent
 [1] 1.0141963 1.0122663 1.0106767 1.0093364 1.0081757 1.0071382 1.0061760
 [8] 1.0052459 1.0043084 1.0033266 1.0022682 1.0011098 0.9998461 0.9985051
[15] 0.9971745 0.9960440 0.9954712 0.9960845 0.9989402 1.0057639 1.0193271
[22] 1.0440495 1.0870127

所有这些比率都接近 1 - 所以这两种方法本质上是试图最大化相同的可能性。

score 2 · Accepted Answer

我确信有办法以更好的方式进行重塑，但这可能会奏效；

library('fitdistrplus')
    library('reshape2')



    a = matrix(c(7,12,10,10,20,3,15,40,33,57,58,60,79,132,174,201,191,184,115,70,22,2,0),1,23)
    colnames(a) <- c(13.5,14.5,15.5,16.5,17.5,18.5,19.5,20.5,21.5,22.5,23.5,24.5,25.5,26.5,27.5,
                     28.5,29.5,30.5,31.5,32.5,33.5,34.5,35.5)

    barplot(a)

    a2 = melt(a)
    a3= (rep(a2[,2],a2[,3]))

    fitdist(a3, "weibull")

descdist(a3,boot=5000)

r - 为表格数据拟合 2 参数 weibull 分布

2 回答 2

Related

Reference