r - 根据 r 中的多个标准在数据框中创建一个新变量

Question

我有一个数据集

COl1 COl2 Col3   
1     0     0 
0     1     0
0     0     1 
1     0     0

基于这三列，我需要在同一个表中添加新变量

预期产出

COl1 COl2 Col3  New_variable   
1     0     0     c1
0     1     0     c2
0     0     1     c3
1     0     0     c1

score 3 · Accepted Answer

如果我们想根据每行中存在的 1 来分配变量，我们可以使用max.col.

df$New_variable <- paste0('c', max.col(df))
df
#  COl1 COl2 Col3 New_variable
#1    1    0    0           c1
#2    0    1    0           c2
#3    0    0    1           c3
#4    1    0    0           c1

如果连续出现多个 1，请检查不同ties.method的?max.col.

如果我们需要为每一行分配唯一的 ID，我们可以逐行粘贴值，然后用于match分配 ID。

vals <- do.call(paste, c(df, sep = "-"))
df$New_variable <- paste0('c', match(vals, unique(vals)))

score 0 · Accepted Answer

另一种base选择：

df$New_variable <- paste0('c', apply(df, 1, function(x) which(x != 0)))

输出：

  COl1 COl2 Col3 New_variable
1    1    0    0           c1
2    0    1    0           c2
3    0    0    1           c3
4    1    0    0           c1

由于对dplyrin 标记的引用含糊不清，您也可以将其与purrr- 尽管与可用的各种解决方案相比显然是一种过度杀伤base（从所有答案中都可以看出）：

library(dplyr)

df %>%
  mutate(
    New_variable = purrr::pmap(select(., 1:3), ~ paste0('c', which(c(...) != 0)))
    )

从而在select(., 1:3)语句中您可以选择要使用的列（这里我们使用所有 3 列，因此您可以只使用.而不是整体select，它会产生相同的效果）。

score 0 · Accepted Answer

以下是一些基本的 R 解决方案：

df$New_variable <- paste0("c",seq(df)%*%t(df))

或者

df$New_variable <- paste0("c",rowSums(df*col(df)))

或者

df$New_variable <- paste0("c",which(t(df)==1,arr.ind = T)[,"row"])

这样

> df
  COl1 COl2 Col3 New_variable
1    1    0    0           c1
2    0    1    0           c2
3    0    0    1           c3
4    1    0    0           c1

数据

df <- structure(list(COl1 = c(1L, 0L, 0L, 1L), COl2 = c(0L, 1L, 0L, 
0L), Col3 = c(0L, 0L, 1L, 0L)), class = "data.frame", row.names = c(NA, 
-4L))

r - 根据 r 中的多个标准在数据框中创建一个新变量

3 回答 3

Related

Reference