2

我正在尝试编写自己的函数来了解泊松分布在最大似然估计框架中的行为(因为它适用于 GLM)。

我熟悉 R 的便捷glm功能,但想尝试手动滚动一些代码以了解发生了什么:

n <- 10000 # sample size
b0 <- 1.0 # intercept
b1 <- 0.2 # coefficient
x <- runif(n=n, min=0, max=1.5) # generate covariate values
lp <- b0+b1*x # linear predictor
lambda <- exp(lp) # compute lamda
y <- rpois(n=n, lambda=lambda) # generate y-values
dta <- data.frame(y=y, x=x) # generate dataset
negloglike <- function(lambda) {n*lambda-sum(x)*log(lambda) + sum(log(factorial(y)))} # build negative log-likelihood
starting.vals <- c(0,0) # one starting value for each parameter
pars <- c(b0, b1)
maxLike <- optim(par=pars,fn=negloglike, data = dta) # optimize

我输入时的 R 输出maxLike如下:

Error in fn(par, ...) : unused argument (data = list(y = c(2, 4....

我假设我optim在我的函数中指定错误,但我对 MLE 的具体细节或约束优化不够熟悉,无法理解我缺少什么。

4

2 回答 2

4

optim 只能以某种方式使用您的功能。它假定函数中的第一个参数将参数作为向量接收。如果您需要将其他信息传递给此函数(在您的情况下为数据),则需要将其作为函数的参数。您的negloglike函数没有data参数,这就是它所抱怨的。您对其进行编码的方式不需要一个,因此您可能只需删除对 optim 的调用中的 data=dat 部分即可解决您的问题,但我没有对此进行测试。这是一个只为泊松(不是 glm)做一个简单 MLE 的小例子

negloglike_pois <- function(par, data){
  x <- data$x
  lambda <- par[1]

  -sum(dpois(x, lambda, log = TRUE))
}

dat <- data.frame(x = rpois(30, 5))
optim(par = 4, fn = negloglike_pois, data = dat)
mean(dat$x)

> optim(par = 4, fn = negloglike_pois, data = dat)
$par
[1] 4.833594

$value
[1] 65.7394

$counts
function gradient 
      22       NA 

$convergence
[1] 0

$message
NULL

Warning message:
In optim(par = 4, fn = negloglike_pois, data = dat) :
  one-dimensional optimization by Nelder-Mead is unreliable:
use "Brent" or optimize() directly
> # The "true" MLE. We didn't hit it exactly but came really close
> mean(dat$x)
[1] 4.833333
于 2014-10-04T19:47:23.923 回答
0

实施 Dason 回答中的评论非常简单,但以防万一:

library("data.table")

d <- data.table(id = as.character(1:100), 
                x1 = runif(100, 0, 1),
                x2 = runif(100, 0, 1))

#' the assumption is that lambda can be written as
#' log(lambda) = b1*x1 + b2*x2 
#' (In addition, could add a random component)
d[, mean := exp( 1.57*x1 + 5.86*x2 )]
#' draw a y for each of the observations
#' (rpois is not vectorized, need to use sapply)
d[, y := sapply(mean, function(x)rpois(1,x)) ]

negloglike_pois <- function(par, data){
  data <- copy(d)
  # update estimate of the mean
  data[, mean_tmp := exp( par[1]*x1 + par[2]*x2 )]
  # calculate the contribution of each observation to the likelihood
  data[, log_p := dpois(y, mean_tmp, log = T)]
  #' Now we can sum up the probabilities
  data[, -sum(log_p)]
}

optim(par = c(1,1), fn = negloglike_pois, data = d)
$par
[1] 1.554759 5.872219

$value
[1] 317.8094

$counts
function gradient 
      95       NA 

$convergence
[1] 0

$message
NULL

于 2020-04-08T20:49:20.723 回答