r - R：如何从数据集的组合中执行更复杂的计算？

Question

现在，我有一个来自内置数据集 iris 的组合。到目前为止，我已经被引导能够找到这对值的 lm() 的系数。

myPairs <- combn(names(iris[1:4]), 2)

formula <- apply(myPairs, MARGIN=2, FUN=paste, collapse="~")

model <- lapply(formula, function(x) lm(formula=x, data=iris)$coefficients[2])

model

但是，我想更进一步，并使用 lm() 中的系数用于进一步的计算。我想做这样的事情：

Coefficient <- lm(formula=x, data=iris)$coefficients[2]
Spread <- myPairs[1] - coefficient*myPairs[2]
library(tseries)
adf.test(Spread)

该过程本身很简单，但我无法为数据集中的每个组合找到一种方法。（作为旁注， adf.test 不会应用于此类数据，但我只是使用 iris 数据集进行演示）。我想知道，为这样的过程编写一个循环会更好吗？

score 2 · Accepted Answer

听起来您想编写自己的函数并在 myPairs 循环中调用它（应用）：

yourfun <- function(pair){
  fm <- paste(pair, collapse='~')
  coef <- lm(formula=fm, data=iris)$coefficients[2]
  Spread <- iris[,pair[1]] - coef*iris[,pair[2]] 
  return(Spread)
}

然后你可以调用这个函数：

model <- apply(myPairs, 2, yourfun)

我认为这是最干净的方式。但是我不知道你到底想做什么，所以我正在为 Spread 编写示例。请注意，在我的示例中，您会收到警告消息，因为列Species是一个因素。

score 2 · Accepted Answer

您可以在combn.

如果您只想对所有组合进行回归，并提取第二个系数，您可以这样做

fun <- function(x) coef(lm(paste(x, collapse="~"), data=iris))[2]
combn(names(iris[1:4]), 2, fun)

然后，您可以扩展函数以计算点差

fun <- function(x) {
         est <- coef(lm(paste(x, collapse="~"), data=iris))[2]
         spread <- iris[,x[1]] - est*iris[,x[2]]
         adf.test(spread)
        }

out <- combn(names(iris[1:4]), 2, fun, simplify=FALSE)
out[[1]]

#   Augmented Dickey-Fuller Test

#data:  spread
#Dickey-Fuller = -3.879, Lag order = 5, p-value = 0.01707
#alternative hypothesis: stationary

将结果与手动运行第一个结果进行比较

est <- coef(lm(Sepal.Length ~ Sepal.Width, data=iris))[2]
spread <- iris[,"Sepal.Length"] - est*iris[,"Sepal.Width"]
adf.test(spread)

#   Augmented Dickey-Fuller Test

# data:  spread
# Dickey-Fuller = -3.879, Lag order = 5, p-value = 0.01707
# alternative hypothesis: stationary

score 1 · Accepted Answer

一些提示：我不会使用与内置函数相同的名称来命名你的东西（在你的原始版本model中formula想到）。

此外，您可以简化paste您正在做的事情 - 请参见下文。

最后，一个更笼统的陈述：不要觉得每件事都需要以*apply某种方式完成。有时简洁和简短的代码实际上更难理解和记住，这些*apply函数最多只能提供一个简单for循环的边际速度增益。（情况并非总是如此R，但在这一点上）。

# Get pairs
myPairs <- combn(x = names(x = iris[1:4]),m = 2)

# Just directly use paste() here
myFormulas <- paste(myPairs[1,],myPairs[2,],sep = "~")

# Store the models themselves into a list
# This lets you go back to the models later if you need something else
myModels <- lapply(X = myFormulas,FUN = lm,data = iris)

# If you use sapply() and this simple function, you get back a named vector
# This seems like it could be useful to what you want to do
myCoeffs <- sapply(X = myModels,FUN = function (x) {return(x$coefficients[2])})

# Now, you can do this using vectorized operations
iris[myPairs[1,]] - iris[myPairs[2,]] * myCoeffs[myPairs[2,]]

如果我理解正确，我相信上述方法会起作用。请注意，目前输出上的名称将是无意义的，您需要用您自己设计的东西替换它们（可能是的值myFormulas）。

r - R：如何从数据集的组合中执行更复杂的计算？

3 回答 3

Related

Reference