r - 使用 miselect 包在测试和训练数据集中 MI 后拆分堆叠数据集用于 MI Lasso / Elastic Net

Question

我是 R 新手，需要进行 MI Lasso/Elastic Net Regression。对于 MI，我使用“mice”包。我需要堆叠格式的 MI 数据来使用包“miselect”执行 ML 模型。我在 MI 之后得到堆叠的数据集，如下所示：

imputed_long <- complete(imputed, include=F, "long")

为了能够看到我的模型与数据的匹配程度，我需要将数据集拆分为测试和训练数据集。

我有两个问题：

在 MI 之前拆分数据是否更好；之后对测试和训练数据集以及 ML 模型分别进行 MI 吗？或者我应该在执行 MI 后拆分数据集吗？我如何将堆叠的数据集拆分为训练和测试数据集（80/20 会很棒）？
如何在“错误选择”中获得 ML 模型的预测性能？我在 miselect 包描述中找不到任何示例。我可以用我的代码交叉验证 alpha 和 lambda，但不知道如何继续。

dim(dfs[[1]]) #12
vars=as.vector(names(dfs[[1]]))
xvars=vars[-which(as.vector(names(dfs[[1]])) == "outcome")]

# Generate list of imputed design matrices and imputed responses

x <- list()
y <- list()
for (i in 1:15) { #15 imputierte DS
  x[[i]] <- as.matrix(dfs[[i]][, xvars])
  y[[i]] <- dfs[[i]]$outcome
}

dim(x[[i]]) #1348 * 11
length(y[[i]]) #1348

#set seed to ensure reproducible results
set.seed(42)

pf       <- rep(1, dim(x[[i]])[2]) #Penalty factor. Can be used to differentially penalize certain variables
adWeight <- rep(1, dim(x[[i]])[2]) #Numeric vector of length p representing the adaptive weights for the L1 penalty

# Simulations demonstrate that the "stacked" objective function approaches tend to be more computationally efficient and have better estimation and selection properties. 
#stacked, elastic net
weights  <- 1 - rowMeans(is.na(newdata)) #Numeric vector of length n containing the proportion #observed (non-missing) for each row in the un-imputed data.
alpha    <- c(.5 , 1) #elastic net



fit.stacked <- cv.saenet(x, y, pf, adWeight, weights, family = "binomial",
                 alpha = alpha, nfolds = 5)

# Get selected variables from the 1 standard error rule
coef(fit.stacked, lambda = fit$lambda.1se, alpha = fit$alpha.1se)
print(fit.stacked)

coefficients=coef(fit.stacked)
coefficients```

Thank you so much in advance!

r - 使用 miselect 包在测试和训练数据集中 MI 后拆分堆叠数据集用于 MI Lasso / Elastic Net

0 回答 0

Related

Reference