r - 如何根据具有字符值的条件合并行？（家庭数据）

Question

我有一个数据框，其中第一列表示工作（经理，员工或工人），第二列表示该人是否在晚上工作，最后一个是家庭代码（如果两个人共享相同的代码，那么这意味着他们共享同一所房子）。

#Here is the reproductible data : 
     PCS <- c("worker", "manager","employee","employee","worker","worker","manager","employee","manager","employee")
     work_night <- c("Yes","Yes","No", "No","No","Yes","No","Yes","No","Yes")
    HHnum <- c(1,1,2,2,3,3,4,4,5,5)
     df <- data.frame(PCS,work_night,HHnum)

我的问题是我想要一个包含家庭而不是个人的新数据框。我想根据 HHnum 对个人进行分组，然后合并他们的答案。

对于变量“PCS”，我有基于答案组合的新类别：Manager+work =“I”；manager+employee="II"、employee+employee=VI、worker+worker=III 等
对于变量“work_night”，我想应用一个分数（两者都回答是然后分数 = 2，如果一个回答是，那么分数 = 1，如果两者都回答不是，那么分数 = 0）。

需要明确的是，我希望我的数据框看起来像这样：

HHnum      PCS      work_night
1          "I"           2
2          "VI"          0
3          "III"         1
4          "II"          1
5          "II"          1

如何使用 dplyr 在 R 上执行此操作？我知道我需要 group_by() 但我不知道该使用什么。

最好的，维克多

score 0 · Accepted Answer

这是一种方法（尽管我承认它非常冗长）。我创建了一个参考数据框（即combos）以防您的类别超过 3 个，然后将其与主数据框（即df_new）连接以引入PCS罗马数字。

library(dplyr)
library(tidyr)

# Create a dataframe with all of the combinations of PCS.
combos <- expand.grid(unique(df$PCS), unique(df$PCS))
combos <- unique(t(apply(combos, 1, sort))) %>% 
  as.data.frame() %>% 
  dplyr::mutate(PCS = as.roman(row_number()))
# Create another dataframe with the columns reversed (will make it easier to join to the main dataframe).
combos2 <- data.frame(V1 = c(combos$V2), V2 = c(combos$V1), PCS = c(combos$PCS)) %>% 
  dplyr::mutate(PCS = as.roman(PCS))
combos <- rbind(combos, combos2)

# Get the count of "Yes" for each HHnum group. 
# Then, put the PCS into 2 columns to join together with "combos" df.
df_new <- df %>% 
  dplyr::group_by(HHnum) %>% 
  dplyr::mutate(work_night = sum(work_night == "Yes")) %>%
  dplyr::group_by(grp = rep(1:2, length.out = n())) %>%
  dplyr::ungroup() %>%
  tidyr::pivot_wider(names_from = grp, values_from = PCS) %>%
  dplyr::rename("V1" = 3, "V2" = 4) %>% 
  dplyr::left_join(combos, by = c("V1", "V2")) %>% 
  unique() %>% 
  dplyr::select(HHnum, PCS, work_night)

r - 如何根据具有字符值的条件合并行？（家庭数据）

1 回答 1

Related

Reference