r - 修改公式中的函数调用

Question

假设您正在研究一个回归模型，并且至少有一个预测变量是通过样条估计的，例如，

library(splines)
data(diamonds, package = "ggplot2")

fit <- lm(price ~ bs(depth, degree = 5) + bs(carat, knots = c(2, 3)) * color, 
          data = diamonds)

上述拟合仅用于说明目的，没有任何有意义的理由。

现在，让我们保持相同的基本公式，但更改深度和克拉的结位置。更新需要以动态方式进行，以便它可以成为更大 MCMC 方法的一部分（结的数量和结位置由可逆跳跃或生/死步骤确定）。

我很清楚updateandupdate.formula调用，但我不相信这些工具会有所帮助。下面的伪代码应该说明我计划开发的函数的行为。

foo <- function(formula, data) { 

  # Original Model matrix, the formula will be of the form:
  Xmat_orig <- model.matrix(formula, data)

  # some fancy method for selecting new knot locations here
  # lots of cool R code....

  # pseudo code for the 'new knots'.  In the example formula above var1 would be
  # depth and var2 would be carat.  The number of elements in this list would be
  # dependent on the formula passed into foo.
  new_knots <- list(k1 = knot_locations_for_var1, 
                    k2 = knot_locations_for_var2)

  # updated model matrix: 
  # pseudo code for that the new model matrix call would look like.
  Xmat_new <- 
    model.matrix(y ~ bs(var1, degree = 5, knots = new_knots$k1) + bs(var2, knots = new_knots$k2) * color, 
                 data = data)

  return(Xmat_new) 
}

有人可以建议一种knots在 abs或 ns动态调用中修改调用的方法吗？

score 1 · Accepted Answer

您可以substitute在 R 中使用函数，其中：

替换(expr, env) 替换返回（未计算的）表达式 expr 的解析树，替换任何绑定在 env 中的变量。

例如：

> rm(list=ls())
> x <- 1
> x + y
Error: object 'y' not found

因为y没有定义。现在使用substitute：

> (expr <- substitute(x + y, list(y=2)))
x + 2
> eval(expr)
[1] 3
> z <- 2
> (expr <- substitute(x + y, list(y=z)))
x + 2
> eval(expr)
[1] 3

在您的示例中：

f1 <- eval(substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color, 
                       list(deg=5, knts=c(2, 3))))
f2 <- eval(substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color,
                       list(deg=6, knts=c(3, 4))))

fit1 <- lm(f1, data=diamonds)
fit2 <- lm(f2, data=diamonds)

通常，您可以编写一个包装substitute调用的函数，例如：

formula.with.knots <- function(degree, knots) {
  f.expr <- substitute(price ~ bs(depth, degree = deg) + bs(carat, knots = knts) * color, 
                        list(deg=degree, knts=knots))

  eval(f.expr)
}

f <- formula.with.knots(5, c(2, 3))
fit <- lm(f, data = diamonds)
summary(fit)

score 1 · Accepted Answer

这是另一种对函数输入内容不那么挑剔的可能性。考虑这个

newknots <- function(form, data, calls=c("bs","ns")) {
    nk <- function(x) { 
        sort(runif(sample(1:5, 1), min = min(data[[x]]), max = max(data[[x]])))
    }
    rr <- function(x, nk, calls) {
        if(is.call(x) && deparse(x[[1]]) %in% calls) {
            x$knots = nk(deparse(x[[2]]))
            x
        } else if (is.recursive(x)) {
            as.call(lapply(as.list(x), rr, nk, calls))
        } else {
            x
        }
    }
    z <- lapply(as.list(form), rr, nk, calls)   
    z <- eval(as.call(z))
    environment(z) <- environment(form)
    z
}

所以这并不是一个微不足道的功能，但希望它不会太糟糕。这个想法是我们可以将公式转换为我们可以递归调查的列表对象。这就是内部rr函数正在做的事情。它需要一个列表，然后查看每个元素。它查找对bsor的调用ns，当它找到它们时，它会替换knots=参数。

在这里，我们使用该kn函数为作为字符串传入的给定变量名称创建一组新的结。我们只需要返回适合该变量的值列表。

最后，我需要将列表重新转换为公式，并确保我们的新对象与原始公式具有相同的环境。所以这实际上确实创建了一个新的公式对象，使原始值保持不变（如果您愿意，可以替换原始值）。

这是一个如何调用/使用此函数的示例。

f <- price ~ ns(carat, knots=c(1,3)) * color + bs(depth, degree = 5) + clarity
newknots(f, diamonds)

# price ~ ns(carat, knots = c(2.09726121873362, 3.94607368792873
# )) * color + bs(depth, degree = 5, knots = c(44.047089480795, 
# 47.8856966942549, 49.7632855847478, 70.9297015387565)) + clarity

所以你可以看到根据我们的新规则添加和替换了结。我不确定您可能还需要哪些其他功能，但希望这将为您提供一个良好的起点。

score 0 · Accepted Answer

公式都绑定到环境。因此，一种选择是使用您可能想要更改的参数的变量单独创建公式，并在函数环境中分配这些变量值。

f <- as.formula("price ~ bs(depth, knots=d_knots) + bs(carat, knots=c_knots) * color", 
                list2env(list(d_knots=c(2,3), c_knots=c(3,2))))

我为d_knots和定义了两个默认值c_knots。然后修改这些值：

environment(f)$d_knots <- c(2,3)
environment(f)$c_knots <- c(3, 2)

然后，您可以将公式提供给建模函数

fit <- lm(f, data=diamonds)

score 0 · Accepted Answer

编辑：

谢谢@MrFlick，您的解决方案正是我想要的。

#原创帖子

感谢@MrFlick 和@hadley，他们在 SO 和 Twitter 上的回复帮助我找到了可行的解决方案。这种方法需要改进，但似乎可以满足我的迫切需求。

下面定义的函数with_new_knots将解析 aformula并通过修改元素terms。（我还要感谢survival包的作者 Terry Therneau，因为我在挖掘该代码以了解当公式strata中包含函数时如何操作公式。）我已经可以想出这个函数会失败的用例，但是重要的部分是该方法的大纲存在，我可以在以后扩展和改进它。

library(ggplot2)
library(reshape2)
library(dplyr)
library(magrittr)
library(splines)
set.seed(42)

with_new_knots <- function(frm, data, iterations = 5L) { 
  # extract the original formula
  old_terms   <- terms(frm, specials = c("bs", "ns"))

  # reconstruct the rhs of the formula with any interaction terms expanded
  cln     <- colnames(attr(old_terms, "factors")) 
  old_rhs <- paste(cln, collapse = " + ")

  # Extract the spline terms from the old_formula 
  idx              <- attr(old_terms, "specials") %>% unlist   %>% sort
  old_spline_terms <- attr(old_terms, "factors")  %>% rownames %>% extract(idx)

  # grab the variable names which splines are built on
  vars <- all.vars(frm)[idx]

  # define the range for each variable in vars
  rngs <- lapply(vars, function(x) { range(data[, x]) })

  # for each of the spline terms, randomly generate new knots
  # This is a silly example, something clever will replace it. 

  out <- replicate(iterations, 
                   {
                     new_knots <- lapply(rngs, function(r) { 
                                         kts <- sort(runif(sample(1:5, 1), min = r[1], max = r[2]))
                                         paste0("c(", paste(kts, collapse = ", "), ")")
                             })

                     new_spline_terms <- 
                       mapply(FUN = function(s, k) { sub(")$", paste0(", knots = ", k, ")"), s) },
                              s = old_spline_terms,
                              k = new_knots)

                     rhs <- old_rhs
                     for(i in 1:length(old_spline_terms)) { 
                       rhs <- gsub(old_spline_terms[i], new_spline_terms[i], rhs, fixed = TRUE)
                     }

                     f <- as.formula(paste(rownames(attr(old_terms, "factors"))[1], "~", rhs))
                     environment(f) <- environment(frm)
                     return(f)
                   }, 
                   simplify = FALSE) 
  return(out) 
}

示例使用：

这里提出并修改了一个统计上无意义的模型with_new_knots以说明结果，更新了一个formula对象，以便更新spline公式中的调用。

f <- price ~ ns(carat) * color + bs(depth, degree = 5) + clarity
with_new_knots(f, diamonds)


orig_fit <- predict(lm(f, data = diamonds))
new_fits <- with_new_knots(f, diamonds) %>%
            lapply(., function(frm) { predict(lm(frm, data = diamonds)) })

dat <- data.frame(orig_fit, new_fits)
names(dat)[2:6] <- paste("new knots", 1:5)
dat <- melt(dat, id.vars = NULL)
dat <- cbind(dat, diamonds)

ggplot(dat) + 
aes(x = carat, y = value, color = color, shape = clarity) + 
geom_line() + 
geom_point(aes(y = price), alpha = 0.1) + 
facet_wrap( ~ variable, scale = "free")

不同结的不同模型的插图

r - 修改公式中的函数调用

4 回答 4

编辑：

#原创帖子

示例使用：

Related

Reference