造成的问题之一interaction()
是它改变了模型的矩阵,这意味着模型数据中包含的一些变量发生了变化:
m <- mgcv::gam(body_mass_g ~ s(flipper_length_mm, by = interaction(species, sex)), data = palmerpenguins::penguins)
head(insight::get_data(m))
#> body_mass_g flipper_length_mm species sex
#> 1 3750 181 Adelie.male male
#> 2 3800 186 Adelie.female female
#> 3 3250 195 Adelie.female female
#> 5 3450 193 Adelie.female female
#> 6 3650 190 Adelie.male male
#> 7 3625 181 Adelie.female female
由reprex 包于 2021-08-06 创建(v2.0.1)
在使用后处理功能时,这可能会导致一些问题,例如用于可视化。
但是,按照 Gavin 和 IRTFM 的回答,这可以通过在模型中添加变量作为固定效应来轻松解决。
这是一个演示,也说明了两个单独的平滑和交互之间的差异:
library(ggplot2)
#> Warning: package 'ggplot2' was built under R version 4.0.5
set.seed(1)
# Create data
data <- data.frame(x = rep(seq(-10, 10, length.out = 500), 2),
fac1 = as.factor(rep(c("A", "B", "C"), length.out = 1000)),
fac2 = as.factor(rep(c("X", "Y"), each = 500)))
data$y <- data$x^2 + rnorm(nrow(data), sd = 5)
data$y[data$fac1 == "A"] <- sign(data$x[data$fac1 == "A"]) * data$y[data$fac1 == "A"] + 50
data$y[data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac1 == "B"]^3, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "C"] <- data$y[data$fac2 == "X" & data$fac1 == "C"] - 100
data$y[data$fac2 == "X" & data$fac1 == "B"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "B"] ^ 2, c(-50, 100))
data$y[data$fac2 == "X" & data$fac1 == "A"] <- datawizard::change_scale(data$y[data$fac2 == "X" & data$fac1 == "A"] * -3, c(0, 100))
# Real trends
ggplot(data, aes(x = x, y = y, color = fac1, shape = fac2)) +
geom_point()
# Two smooths
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = fac1) + s(x, by = fac2), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
# Interaction
m <- mgcv::gam(y ~ fac1 * fac2 + s(x, by = interaction(fac1, fac2)), data = data)
plot(modelbased::estimate_relation(m, length = 100, preserve_range = F))
由reprex 包于 2021-08-06 创建(v2.0.1)
最后一个模型设法恢复每个因素组合的趋势。