列名中包含无效字符的数据框导致 rlm() 出错。
深入研究一下,在 rlm() 中,该变量似乎xvars
包含公式的解释变量的名称,但它在有问题的名称周围加上了反引号。然后当 xvars 用作数据框的索引时,即mf[xvars]
会导致以下错误:
Error in `[.data.frame`(mf, xvars) : undefined columns selected
这是预期的行为吗?(我意识到关键字词组无效字符)。奇怪的是,在同一模型和数据帧上调用 lm() 不会导致任何问题。
# SAMPLE DATA
mydf <- data.frame(matrix(rnorm(36),ncol=6))
colnames(mydf) <- c("y", "x1", "x2", "x1^2", "x2^2", "x1:x2")
rlm(y~., data=mydf) # Error
lm(y~., data=mydf) # No Problem
# Clean up column names
colnames(mydf) <- make.names(colnames(mydf))
rlm(y~., data=mydf) # No Problem
看一下MASS:::rlm.formula
,似乎错误是
由mf[xvars]
以下几行引起的:
xlev <- if (length(xvars) > 0L) {
xlev <- lapply(mf[xvars], levels)
xlev[!sapply(xlev, is.null)]
}
有什么想法为什么要添加反引号但随后会导致错误?
附加信息
我复制了 rlm() 函数,添加了dput(mf)
&dput(xvars)
并得到了以下值。请注意,xvars 的值与上面指定的名称不同(即添加了反引号)。此外,mf 的名称与上面给出的名称相同。
# dput yielded
mf <- structure(list(y = c(-0.242914027018629, 0.724255425682537, -0.0578467214604185, -0.274193999595702, -0.38985000750839, 0.406046200943395), x1 = c(1.53071709960635, -1.87493297716611, 1.0936519723035, -0.977011182431237, -0.510890461021046, 1.20136627562427), x2 = c(-0.801995963036553, 1.30590232081605, 0.635922235436178, -1.86824341731708, -2.76797814532917, -0.497992681627495), `x1^2` = c(0.914146279518207, 0.103458073891876, -1.29818230391818, -0.629048606358592, 1.71534374557621, 0.922690967521984), `x2^2` = c(-0.0879726513660469, 1.05299413769867, 1.01955640371072, 0.546413685721721, 0.947757793667223, -0.0998700630220064), `x1:x2` = c(-0.757490494166813, 1.31307393014016, 1.90233916482184, 0.68844011701049, -1.28717997826724, -0.581800325341162)), .Names = c("y", "x1", "x2", "x1^2", "x2^2", "x1:x2"), terms = y ~ x1 + x2 + `x1^2` + `x2^2` + `x1:x2`, row.names = c(NA, 6L), class = "data.frame")
xvars <- c("x1", "x2", "`x1^2`", "`x2^2`", "`x1:x2`")
mf[xvars]
# Error in `[.data.frame`(mf, xvars) : undefined columns selected
# Removing the backticks from xvars eliminates the error.
xvars <- sapply(xvars, function(x) gsub("`", "", x))
mf[xvars2] # No Error