r - 根据答案的值重新编码按顺序命名的变量

Question

我正在努力使用lapply简洁地重新编码值。

假设我有 10 个调查问题，每个问题有 4 个答案，其中总是有一个正确或错误的答案。问题被标记q_1通过q_10，我的数据框被调用df。我想创建具有相同顺序标签的新变量，这些标签只需将问题编码为“正确”（1）或“错误”（0）。

如果我要列出正确答案，那将是：

right_answers<-c(1,2,3,4,2,3,4,1,2,4)

然后，我正在尝试编写一个函数，该函数在使用相同的顺序标识符的同时将所有变量简单地重新编码为新变量，例如

lapply(1:10, function(fx) {
  df$know_[fx]<-ifelse(df$q_[fx]==right_answers[fx],1,0)
})

在这个代码远程正确的假设宇宙中，我会得到这样的结果：

id   q_1    know_1   q_2   know_2
1    1      1        2     1
2    4      0        3     0
3    3      0        2     1
4    4      0        1     0

非常感谢你的帮助！

score 1 · Accepted Answer

对于与其他答案相同的矩阵输出，我建议：

q_names <- paste0("q_", seq_along(right_answers))
answers <- df[q_names]
correct <- mapply(`==`, answers, right_answers)

score 0 · Accepted Answer

您可能对这部分代码有问题df$q_[fx]。您可以使用paste. 如：

df = read.table(text = "
id   q_1   q_2
1    1              2     
2    4              3     
3    3              2     
4    4              1", header = TRUE)  

right_answers = c(1,2,3,4,2,3,4,1,2,4)

dat2 = sapply(1:2, function(fx) {
            ifelse(df[paste("q",fx,sep = "_")]==right_answers[fx],
                      1,0)
})

这不会在您的 data.frame 中添加列，而是创建一个新矩阵，就像@SenorO 的答案一样。您可以命名矩阵中的列，然后将它们添加到原始 data.frame 中，如下所示。

colnames(dat2) = paste("know", 1:2, sep = "_")

data.frame(df, dat2)

score 0 · Accepted Answer

这应该为您提供每个答案是否正确的矩阵：

t(apply(test[,grep("q_", names(test))], 1, function(X) X==right_answers))

score 0 · Accepted Answer

我想建议一种不同的方法来解决你的问题，使用 reshape2 包。在我看来，这具有以下优点：1）更惯用的 R（对于它的价值），2）更易读的代码，3）更不容易出错，特别是如果你想在未来添加分析。在这种方法中，一切都在数据帧内完成，我认为如果可能的话，这是可取的——更容易为单个记录（在本例中为 id）保留所有值，并且更容易使用 R 工具的强大功能。

# Creating a dataframe with the form you describe
df <- data.frame(id=c('1','2','3','4'), q_1 = c(1,4,3,4), q_2 = c(2,3,2,1), q_3 = rep(1,     4), q_4 = rep(2, 4), q_5 = rep(3, 4), 
             q_6 = rep(4,4), q_7 = c(1,4,3,4), q_8 = c(2,3,2,1), q_9 = rep(1, 4), q_10 =     rep(2, 4))

right_answers<-c(1,2,3,4,2,3,4,1,2,4)

# Associating the right answers explicitly with the corresponding question labels in a data frame
answer_df <- data.frame(questions=paste('q', 1:10, sep='_'), right_answers)

library(reshape2)

# "Melting" the dataframe from "wide" to "long" form -- now questions labels are in variable values rather than in column names
melt_df <- melt(df) # melt function is from reshape2 package

# Now merging the correct answers into the data frame containing the observed answers
merge_df <- merge(melt_df, answer_df, by.x='variable', by.y='questions')

# At this point comparing the observed to correct answers is trivial (using as.numeric to     convert from logical to 0/1 as you request, though keeping as TRUE/FALSE may be clearer)
merge_df$correct <- as.numeric(merge_df$value==merge_df$right_answers)

# If desireable (not sure it is), put back into "wide" dataframe form
cast_obs_df <- dcast(merge_df, id ~ variable, value.var='value') # dcast function is from reshape2 package
cast_cor_df <- dcast(merge_df, id ~ variable, value.var='correct')
names(cast_cor_df) <- gsub('q_', 'know_', names(cast_cor_df))
final_df <- merge(cast_obs_df, cast_cor_df)

新的 tidyr 包在这里可能比 reshape2 更好。

r - 根据答案的值重新编码按顺序命名的变量

4 回答 4

Related

Reference