tl;dr:这种类型的建模对起始值很敏感。我详细介绍了如何成功解决该问题。有助于此修复的三个主要事项是:
- 增加最大迭代和函数调用
- 获取知情的起始值
- 由于最初添加的国家/地区的可用信息较少,因此可能会限制/子集数据
简要概述:
我开始修补,只是盲目地改变起始值。更改c(1000, 20, 20)
为c(20000, 10, 10)
我让模型像baseModel.naive
使用Log-likelihood: -23451.37
. 使用增加的最大值进行迭代和函数调用并获取知情的起始值使模型能够通过baseModel.bellcurve
报告Log-likelihood: -20487.86
.
天真地改变起始值将摆脱错误
我调整了起始值。此外,我还展示了如何增加函数调用和迭代的最大数量以及使用详细选项。?nlme
使用和可以找到更多信息?nlmeControl
。根据我的经验,这些类型的模型可能对起始值、最大迭代次数和函数调用很敏感。
baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat,
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(20000, 10, 10)),
na.action = na.omit,
control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
baseModel.naive
输出:
> baseModel.naive <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
+ data = dat,
+ fixed = list(d ~ 1, mu ~ 1, var ~ 1),
+ random = d + mu + var ~ 1|Country.Region,
+ start = list(fixed = c(20000, 10, 10)),
+ na.action = na.omit,
+ control=list(maxIter=1e2, msMaxIter = 1e6, msVerbose = TRUE))
0: 30455.774: 2.23045 10.6880 8.80935 -11177.6 3277.46 -1877.96
1: 30450.763: 2.80826 11.2726 9.37889 -11177.6 3277.46 -1877.96
2: 30449.789: 2.94771 11.5283 9.80691 -11177.6 3277.46 -1877.96
3: 30449.150: 3.30454 11.8277 10.0329 -11177.6 3277.46 -1877.96
4: 30448.727: 3.64209 12.2237 10.4571 -11177.6 3277.46 -1877.96
5: 30448.540: 3.97875 12.5637 10.8217 -11177.6 3277.46 -1877.96
6: 30448.445: 4.31754 12.9055 11.1826 -11177.6 3277.46 -1877.96
7: 30448.397: 4.65727 13.2480 11.5420 -11177.6 3277.46 -1877.96
8: 30448.372: 4.99746 13.5908 11.9008 -11177.6 3277.46 -1877.96
9: 30448.360: 5.33789 13.9337 12.2591 -11177.6 3277.46 -1877.96
10: 30448.354: 5.67844 14.2767 12.6173 -11177.6 3277.46 -1877.96
11: 30448.351: 6.01905 14.6197 12.9753 -11177.6 3277.46 -1877.96
12: 30448.349: 6.35969 14.9627 13.3333 -11177.6 3277.46 -1877.96
13: 30448.349: 6.70035 15.3058 13.6913 -11177.6 3277.46 -1877.96
14: 30448.348: 7.04102 15.6489 14.0493 -11177.6 3277.46 -1877.96
15: 30448.348: 7.38170 15.9919 14.4073 -11177.6 3277.46 -1877.96
16: 30448.348: 7.72237 16.3350 14.7652 -11177.6 3277.46 -1877.96
17: 30448.348: 8.06305 16.6781 15.1232 -11177.6 3277.46 -1877.96
18: 30448.348: 8.40373 17.0211 15.4811 -11177.6 3277.46 -1877.96
19: 30448.348: 8.74441 17.3642 15.8391 -11177.6 3277.46 -1877.96
20: 30448.348: 9.08508 17.7073 16.1971 -11177.6 3277.46 -1877.96
21: 30448.348: 9.42576 18.0503 16.5550 -11177.6 3277.46 -1877.96
0: 30108.384: 9.42573 18.0503 16.5550 -11183.6 3279.09 -1879.43
1: 30108.384: 9.42568 18.0503 16.5550 -11183.6 3279.09 -1879.43
2: 30108.384: 9.42544 18.0502 16.5548 -11183.6 3279.09 -1879.43
0: 30108.056: 9.42539 18.0502 16.5548 -11168.0 3239.13 -1879.51
1: 30108.056: 9.42526 18.0502 16.5549 -11168.0 3239.13 -1879.51
0: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
1: 30108.055: 9.42523 18.0502 16.5549 -11150.0 3223.70 -1879.28
> baseModel.naive
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat
Log-likelihood: -23451.37
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
4290.47415 35.32178 38.44048
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.402353e-01 d mu
mu 2.518250e-05 0
var 1.123208e-04 0 0
Residual 1.738523e+03
Number of Observations: 2641
Number of Groups: 126
但也许这还不够。看到Corr
带 0 的?似乎有些不对劲。所以我从 Roland ( https://stackoverflow.com/a/27549369/2727349 ) 那里获取了一个页面,并决定获得一个类似于您的bellcurve.model
.
我还尝试将数据子集为Country.Region
s ,以防出现问题。我在这里分节详细介绍了我的结果/代码,最后(在附录中)将它们放在一起,以便快速复制和粘贴。
初步和数据探索
为了探索事物,我尝试将数据限制在具有最少天数的国家/地区。我选择了 45 天(5 个国家),然后慢慢增加到 1 天(完整数据集)。我曾经data.table
计算和显示相关的东西。
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
从 nls() 和数据中获取自动起始值
这就是Roland 在相关帖子中的回答发挥作用的地方。我们需要一种自动化的方式来获取知情的起始值,而这样做的一种方法是使用nls()
和自启动功能。您可以使用您的自启动功能,bellcurve.model
但我发现SSbell
这很相似并决定使用它。我运行nls()
并获得具有 (对应于)和(对应于in )SSbell
的公式的起始值。对于中的起始值,我首先计算 each的方差,然后选择较小的一个,确定 20%-tile,因为这适用于和SSbell
ymax
d
bellcurve.model
xc
mu
bellcurve.model
var
bellcurve.model
Country.Region
minimum_days<-45
minimum_days<-1
(完整数据)。
## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
再一次——我不得不玩一下——但如果你采用 20% 的经验方差,代码nlme
调用bellcurve.model
将与(完整数据集)minimum_days <- 45
一样运行。minimum_days <- 1
计算出我们的起始值并可能限制我们的数据集,我们适合两个nlme
模型:一个 usingSSbell
和另一个bellcurve.model
。两者都将运行minimum_days<-45
并且只会bellcurve.model
收敛minimum_days<-1
(完整数据集)。幸运的!
两个 nlme 调用:一个 with SSbell
,另一个 forbellcurve.model
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6,
msMaxIter = 1e6,
msVerbose = TRUE)
)
比较输出
baseModel.ssbell
baseModel.bellcurve
显示类似拟合的输出 minimum_days<-45
(查看可能性)。
## 5 countries DATA minimum_days <- 45
> baseModel.ssbell
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ SSbell(day, ymax, a, b, xc)
Data: dat[N >= minimum_days]
Log-likelihood: -2264.706
Fixed: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
ymax a b xc
1.026599e+03 -1.164625e-02 -1.626079e-04 1.288924e+01
Random effects:
Formula: list(ymax ~ 1, a ~ 1, b ~ 1, xc ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
ymax 1.731582e+03 ymax a b
a 4.467475e-06 -0.999
b 1.016493e-04 -0.998 0.999
xc 3.528238e+00 1.000 -0.999 -0.999
Residual 8.045770e+02
Number of Observations: 278
Number of Groups: 5
## 5 countries DATA minimum_days <- 45
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -2267.406
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
965.81525 12.73549 58.22049
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 1.633432e+03 d mu
mu 2.815230e+00 1.00
var 5.582379e-03 -0.01 -0.01
Residual 8.127932e+02
Number of Observations: 278
Number of Groups: 5
显示baseModel.bellcurve
for minimum_days<-1
(完整数据集)的输出显示了baseModel.naive
我出于摆脱错误和警告的唯一目的而盲目和任意摆弄起始值的改进。
## FULL DATA minimum_days <- 1
> baseModel.bellcurve
Nonlinear mixed-effects model fit by maximum likelihood
Model: new_cases ~ bellcurve.model(d, mu, var, x = day)
Data: dat[N >= minimum_days]
Log-likelihood: -20487.86
Fixed: list(d ~ 1, mu ~ 1, var ~ 1)
d mu var
1044.52101 25.00288 81.79004
Random effects:
Formula: list(d ~ 1, mu ~ 1, var ~ 1)
Level: Country.Region
Structure: General positive-definite, Log-Cholesky parametrization
StdDev Corr
d 4.285702e+03 d mu
mu 5.423452e+00 0.545
var 3.008379e-03 0.028 0.050
Residual 4.985698e+02
Number of Observations: 2641
Number of Groups: 126
Log-likelihood: -20487.86
for baseModel.bellcurve
vs Log-likelihood: -23451.37
for baseModel.naive
Corr 矩阵baseModel.bellcurve
看起来也更好。
附录:一次抓取代码。
library(nlme)
library(nlraa)
library(data.table)
dat <- readRDS("dat.rds")
str(dat)
setDT(dat)
# Bell curve function defined by three parameters
bellcurve.model <- function(d, mu, var, x) {
f <- d*exp(-((x - mu)^2) / (2*var))
return(f)
}
dim(dat)
head(dat)
## how many countries?
dat[,uniqueN(Country.Region)]
## number of days/rows per country.region:
print(dat[,.N,by=Country.Region][order(N)], nrows=200)
dat[,N:=.N, by=Country.Region]
minimum_days <- 45
## model restricted to following countries/regions:
data.frame(Country.Region=(as.character(unique(dat[N>=minimum_days]$Country.Region))))
## use this approach https://stackoverflow.com/a/27549369/2727349
## but instead of SSlogis
## use nlraa::SSbell
## we can get starting values for d and mu.
plot(new_cases~day, data=dat[N>=minimum_days])
nls_starting_values <- nls(new_cases ~ SSbell(day, ymax, a, b, xc), data=dat[N>=minimum_days])
summary(nls_starting_values)
## calculate a starting value for var: the median of Country.Region specific variances
variances <- sort(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1)
range(variances)
quantile(variances, seq(0,1,0.05))
##var.start <- median(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
##var.start <- min(dat[N>=minimum_days, var(new_cases, na.rm=T), by=Country.Region]$V1, na.rm=T)
var.start <- quantile(variances, 0.20)
var.start
# NLME Model: using ssbell and nls_starting values for ymax, a, b, xc
baseModel.ssbell <- nlme(new_cases ~ SSbell(day, ymax, a, b, xc),
data = dat[N>=minimum_days],
fixed = list(ymax ~ 1, a ~ 1, b~1, xc~1),
random = ymax + a + b + xc ~ 1|Country.Region,
start = list(fixed = c( coef(nls_starting_values) )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
# NLME Model: using bellcurve.model and nls_starting values for ymax for d;
# NLME Model: using bellcurve.model and nls_starting values for xc for mu;
# NLME Model: using bellcurve.model and var.start for var
baseModel.bellcurve <- nlme(new_cases ~ bellcurve.model(d, mu, var, x = day),
data = dat[N>=minimum_days],
fixed = list(d ~ 1, mu ~ 1, var ~ 1),
random = d + mu + var ~ 1|Country.Region,
start = list(fixed = c(coef(nls_starting_values)[c(1,4)], var.start )),
na.action = na.omit,
control=list(maxIter=1e6, msMaxIter = 1e6, msVerbose = TRUE)
)
baseModel.ssbell
baseModel.bellcurve