r - 以其他列条目为条件的 data.frame 中重新编码一列的更有效方法

Question

我正在寻找一种更有效的方法来重新编码数据框中的列条目，其中重新编码取决于其他列中的条目。

以这个简单的例子为例，它演示了我当前为重新编码的数据创建一个新列，将其转换为字符，然后使用子集方括号重新编码数据的过程（这个过程有正式名称吗？）。

## example data frame
df = data.frame( id = seq( 1 , 100 , by=1 ) ,
                 x = rep( c("W", "Z") , each=50),
                 y = c( rep( c("A","B","C","D") , 25 ) ) )

# add a new column based on column y; convert to character 
df$newY = as.character( df$y ) 

# change newY entries to numbers based on conditions in other columns
df$newY[ df$x == "W" & df$newY == "B" ] <- 1
df$newY[ df$x == "Z" & df$newY == "D" ] <- 3

此过程适用于重新编码具有少量条件的变量，但对于大量条件参数或有许多不同的变量要重新编码时变得很麻烦。

谁能帮我找到一种更有效的方法来做到这一点？

谢谢！

score 1 · Accepted Answer

对此的一些方法：

df <- data.frame(id = seq( 1 , 100 , by=1 ) ,
                 x = rep( c("W", "Z") , each=50),
                 y = c( rep( c("A","B","C","D") , 25)))

# Take the product (my preference)
as.numeric(df$x) * as.numeric(df$y)

# Create new factor based on x and y and convert to numeric
as.numeric(as.factor(paste0(df$x, df$y)))

r - 以其他列条目为条件的 data.frame 中重新编码一列的更有效方法

1 回答 1

Related

Reference