2

我正在尝试建立一个nonlinear mixed effects model适用于 COVID-19 的数据,该数据符合来自不同国家/地区的每日病例数的钟形曲线(随机效应在国家一级)。

数据表太大,无法包含在帖子中,但这里是数据框的

> str(dat)
Classes ‘tbl_df’, ‘tbl’ and 'data.frame':   2642 obs. of  7 variables:
 $ Country.Region: Factor w/ 181 levels "Afghanistan",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ date          : Date, format: "2020-03-24" "2020-03-25" "2020-03-26" "2020-03-27" ...
 $ day           : num  0 1 2 3 4 5 6 7 8 9 ...
 $ total_cases   : int  74 84 94 110 110 120 170 174 237 273 ...
 $ new_cases     : int  34 10 10 16 0 10 50 4 63 36 ...
 $ total_deaths  : int  1 2 4 4 4 4 4 4 4 6 ...
 $ new_deaths    : int  0 1 2 0 0 0 0 0 0 2 ...

尝试拟合模型:

library(nlme)
dat <- readRDS("dat.rds")

# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}

# NLME Model
baseModel <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                  data = dat, 
                  fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                  random = d + mu + var ~ 1|Country.Region,
                  start = list(fixed = c(1000, 20, 20)),
                  na.action = na.omit)

但是,这是由此产生的错误:

nlme.formula 中的错误(new_cases ~ bellcurve.model(d, mu, var, x = day), : 0 级反解中的奇点,块 1

我读过其他 StackOverflow 帖子,暗示某些协变量可能会在模型中混淆,但我在这里用来预测的唯一协变量new_casesday. 任何有关这意味着什么的建议或调试提示将不胜感激,尤其是有关如何解决此问题的任何建议。

4

1 回答 1

2

tl;dr:这种类型的建模对起始值很敏感。我详细介绍了如何成功解决该问题。有助于此修复的三个主要事项是:

  • 增加最大迭代和函数调用
  • 获取知情的起始值
  • 由于最初添加的国家/地区的可用信息较少,因此可能会限制/子集数据

简要概述:

我开始修补,只是盲目地改变起始值。更改c(1000, 20, 20)c(20000, 10, 10)我让模型像baseModel.naive使用Log-likelihood: -23451.37. 使用增加的最大值进行迭代和函数调用并获取知情的起始值使模型能够通过baseModel.bellcurve报告Log-likelihood: -20487.86.


天真地改变起始值将摆脱错误

我调整了起始值。此外,我还展示了如何增加函数调用和迭代的最大数量以及使用详细选项。?nlme使用和可以找到更多信息?nlmeControl。根据我的经验,这些类型的模型可能对起始值、最大迭代次数和函数调用很敏感。

baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                  data = dat, 
                  fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                  random = d + mu + var ~ 1|Country.Region,
                  start = list(fixed = c(20000, 10, 10)),
                  na.action = na.omit,
                  control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
baseModel.naive

输出:

> baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
+                   data = dat, 
+                   fixed = list(d ~ 1, mu ~ 1, var ~ 1),
+                   random = d + mu + var ~ 1|Country.Region,
+                   start = list(fixed = c(20000, 10, 10)),
+                   na.action = na.omit,
+                   control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
  0:     30455.774:  2.23045  10.6880  8.80935 -11177.6  3277.46 -1877.96
  1:     30450.763:  2.80826  11.2726  9.37889 -11177.6  3277.46 -1877.96
  2:     30449.789:  2.94771  11.5283  9.80691 -11177.6  3277.46 -1877.96
  3:     30449.150:  3.30454  11.8277  10.0329 -11177.6  3277.46 -1877.96
  4:     30448.727:  3.64209  12.2237  10.4571 -11177.6  3277.46 -1877.96
  5:     30448.540:  3.97875  12.5637  10.8217 -11177.6  3277.46 -1877.96
  6:     30448.445:  4.31754  12.9055  11.1826 -11177.6  3277.46 -1877.96
  7:     30448.397:  4.65727  13.2480  11.5420 -11177.6  3277.46 -1877.96
  8:     30448.372:  4.99746  13.5908  11.9008 -11177.6  3277.46 -1877.96
  9:     30448.360:  5.33789  13.9337  12.2591 -11177.6  3277.46 -1877.96
 10:     30448.354:  5.67844  14.2767  12.6173 -11177.6  3277.46 -1877.96
 11:     30448.351:  6.01905  14.6197  12.9753 -11177.6  3277.46 -1877.96
 12:     30448.349:  6.35969  14.9627  13.3333 -11177.6  3277.46 -1877.96
 13:     30448.349:  6.70035  15.3058  13.6913 -11177.6  3277.46 -1877.96
 14:     30448.348:  7.04102  15.6489  14.0493 -11177.6  3277.46 -1877.96
 15:     30448.348:  7.38170  15.9919  14.4073 -11177.6  3277.46 -1877.96
 16:     30448.348:  7.72237  16.3350  14.7652 -11177.6  3277.46 -1877.96
 17:     30448.348:  8.06305  16.6781  15.1232 -11177.6  3277.46 -1877.96
 18:     30448.348:  8.40373  17.0211  15.4811 -11177.6  3277.46 -1877.96
 19:     30448.348:  8.74441  17.3642  15.8391 -11177.6  3277.46 -1877.96
 20:     30448.348:  9.08508  17.7073  16.1971 -11177.6  3277.46 -1877.96
 21:     30448.348:  9.42576  18.0503  16.5550 -11177.6  3277.46 -1877.96
  0:     30108.384:  9.42573  18.0503  16.5550 -11183.6  3279.09 -1879.43
  1:     30108.384:  9.42568  18.0503  16.5550 -11183.6  3279.09 -1879.43
  2:     30108.384:  9.42544  18.0502  16.5548 -11183.6  3279.09 -1879.43
  0:     30108.056:  9.42539  18.0502  16.5548 -11168.0  3239.13 -1879.51
  1:     30108.056:  9.42526  18.0502  16.5549 -11168.0  3239.13 -1879.51
  0:     30108.055:  9.42523  18.0502  16.5549 -11150.0  3223.70 -1879.28
  1:     30108.055:  9.42523  18.0502  16.5549 -11150.0  3223.70 -1879.28
> baseModel.naive
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat 
  Log-likelihood: -23451.37
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
         d         mu        var 
4290.47415   35.32178   38.44048 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr   
d        1.402353e-01 d    mu
mu       2.518250e-05 0      
var      1.123208e-04 0    0 
Residual 1.738523e+03        

Number of Observations: 2641
Number of Groups: 126  

但也许这还不够。看到Corr带 0 的?似乎有些不对劲。所以我从 Roland ( https://stackoverflow.com/a/27549369/2727349 ) 那里获取了一个页面,并决定获得一个类似于您的bellcurve.model.
我还尝试将数据子集为Country.Regions ,以防出现问题。我在这里分节详细介绍了我的结果/代码,最后(在附录中)将它们放在一起,以便快速复制和粘贴。

初步和数据探索

为了探索事物,我尝试将数据限制在具有最少天数的国家/地区。我选择了 45 天(5 个国家),然后慢慢增加到 1 天(完整数据集)。我曾经data.table计算和显示相关的东西。

library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)


# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}


dim(dat)
head(dat)

## how many countries?
dat[,uniqueN(Country.Region)]

## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]


minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))

从 nls() 和数据中获取自动起始值

这就是Roland 在相关帖子中的回答发挥作用的地方。我们需要一种自动化的方式来获取知情的起始值,而这样做的一种方法是使用nls()和自启动功能。您可以使用您的自启动功能,bellcurve.model但我发现SSbell这很相似并决定使用它。我运行nls()并获得具有 (对应于)和(对应于in )SSbell的公式的起始值。对于中的起始值,我首先计算 each的方差,然后选择较小的一个,确定 20%-tile,因为这适用于和SSbellymaxdbellcurve.modelxcmubellcurve.modelvarbellcurve.modelCountry.Regionminimum_days<-45minimum_days<-1(完整数据)。

## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)

## calculate a starting value for var:  the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))

##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start

再一次——我不得不玩一下——但如果你采用 20% 的经验方差,代码nlme调用bellcurve.model将与(完整数据集)minimum_days <- 45一样运行。minimum_days <- 1计算出我们的起始值并可能限制我们的数据集,我们适合两个nlme模型:一个 usingSSbell和另一个bellcurve.model。两者都将运行minimum_days<-45并且只会bellcurve.model收敛minimum_days<-1(完整数据集)。幸运的!

两个 nlme 调用:一个 with SSbell,另一个 forbellcurve.model

# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc), 
                         data = dat[N>=minimum_days], 
                         fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
                         random = ymax + a + b + xc ~ 1|Country.Region,
                         start = list(fixed = c(    coef(nls_starting_values) )),
                         na.action = na.omit,
                         control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc   for mu;
# NLME Model: using bellcurve.model and                    var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                            data = dat[N>=minimum_days], 
                            fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                            random = d + mu + var ~ 1|Country.Region,
                            start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
                            na.action = na.omit,
                            control=list(maxIter=1e6, 
msMaxIter = 1e6, 
msVerbose = TRUE)
)

比较输出

baseModel.ssbell
baseModel.bellcurve

显示类似拟合的输出 minimum_days<-45(查看可能性)。

## 5 countries DATA minimum_days <- 45
> baseModel.ssbell
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ SSbell(day, ymax, a, b, xc) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -2264.706
  Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1) 
         ymax             a             b            xc 
 1.026599e+03 -1.164625e-02 -1.626079e-04  1.288924e+01 

Random effects:
 Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr                
ymax     1.731582e+03 ymax   a      b     
a        4.467475e-06 -0.999              
b        1.016493e-04 -0.998  0.999       
xc       3.528238e+00  1.000 -0.999 -0.999
Residual 8.045770e+02                     

Number of Observations: 278
Number of Groups: 5 
## 5 countries DATA minimum_days <- 45
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -2267.406
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
        d        mu       var 
965.81525  12.73549  58.22049 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr       
d        1.633432e+03 d     mu   
mu       2.815230e+00  1.00      
var      5.582379e-03 -0.01 -0.01
Residual 8.127932e+02            

Number of Observations: 278
Number of Groups: 5 

显示baseModel.bellcurvefor minimum_days<-1(完整数据集)的输出显示了baseModel.naive我出于摆脱错误和警告的唯一目的而盲目和任意摆弄起始值的改进。

## FULL DATA minimum_days <- 1
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
  Model: new_cases ~ bellcurve.model(d, mu, var, x = day) 
  Data: dat[N >= minimum_days] 
  Log-likelihood: -20487.86
  Fixed: list(d ~ 1, mu ~ 1, var ~ 1) 
         d         mu        var 
1044.52101   25.00288   81.79004 

Random effects:
 Formula: list(d ~ 1, mu ~ 1, var ~ 1)
 Level: Country.Region
 Structure: General positive-definite, Log-Cholesky parametrization
         StdDev       Corr       
d        4.285702e+03 d     mu   
mu       5.423452e+00 0.545      
var      3.008379e-03 0.028 0.050
Residual 4.985698e+02            

Number of Observations: 2641
Number of Groups: 126 

Log-likelihood: -20487.86for baseModel.bellcurvevs Log-likelihood: -23451.37for baseModel.naiveCorr 矩阵baseModel.bellcurve 看起来也更好。

附录:一次抓取代码。



library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)


# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
  f <- d*exp(-((x - mu)^2) / (2*var))
  return(f)
}


dim(dat)
head(dat)

## how many countries?
dat[,uniqueN(Country.Region)]

## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]


minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))



## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)

## calculate a starting value for var:  the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))

##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start




# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc), 
                         data = dat[N>=minimum_days], 
                         fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
                         random = ymax + a + b + xc ~ 1|Country.Region,
                         start = list(fixed = c(    coef(nls_starting_values) )),
                         na.action = na.omit,
                         control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)


# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc   for mu;
# NLME Model: using bellcurve.model and                    var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day), 
                            data = dat[N>=minimum_days], 
                            fixed = list(d ~ 1, mu ~ 1, var ~ 1),
                            random = d + mu + var ~ 1|Country.Region,
                            start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
                            na.action = na.omit,
                            control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)


baseModel.ssbell
baseModel.bellcurve

于 2020-04-10T23:08:13.787 回答