r - 如何计算R中一行中的唯一值

Question

我在 R 中有以下面板数据集，其中包含一个 ID 变量并显示该 ID 的最后登录详细信息。

id name address last_log_june1 last_log_june2 last_log_june3 last_log_june4 last_log_june"n"
1    A           2020-06-01     2020-06-01    2020-06-03
2    B           2020-06-01      2020-06-01   2020-06-01
3    C           2020-06-01     2020-06-02    2020-06-03

在上面的数据集中，我想计算 A、B 和 C 登录的唯一次数。我如何在 R 中做到这一点，以便我只选择“last_log_date”变量并让 R 计算其中的唯一日期? 我还想将此计数列添加到数据集中。

期待解决这个问题！

谢谢，拉奇塔

score 0 · Accepted Answer

（版本 1.0.0）包中有一些功能dplyr可能会有所帮助。

假设您的数据以、、df列和一系列以开头的列调用，并且这些列中可能存在一些值。IDnameaddresslast_log_juneNA

new_df <- df %>% rowwise() %>% ## indicate you want to apply functions on rows
  mutate(na_exists = ifelse(sum(is.na(c_across(starts_with("last_log_june"))))>0,1,0), 
         ## an intermediate variable na_exists to indicate whether or not there is `NA` in any of the columns
         unique_with_NA = length(unique(c_across(starts_with("last_log_june")),na.rm=T))
         ## if there is NA, the unique function will also count `NA` as a unique value
         unique_withno_NA = unique_with_NA-na_exists
         ## if you don't want NA counted as an unique value, then the final result should exclude it
) %>% select (-na_exists, -unique_with_NA)
      ## remove the intermediate variables

使用函数c_across(starts_with("last_log_june"))只会考虑以last_log_june

score 0 · Accepted Answer

您需要该unique功能并将其应用于行。

df <- data.frame(id = 1:3, name = LETTERS[1:3], 
                 last_log_june1 = c("2020-06-01", "2020-06-01", "2020-06-01"), 
                 last_log_june2 = c("2020-06-01", "2020-06-01", "2020-06-02"),  
                 last_log_june3 = c("2020-06-01", "2020-06-02", "2020-06-03"), 
                 stringsAsFactors = FALSE)

n = 3 # number of "last_log_june" columns
result <- apply(df[, paste0("last_log_june", 1:n)], 1, function(x) unique(unlist(x)))
sapply(result, length) # shows a vector with the number of unique values
df$count <- sapply(result, length) # new column

那是你需要的吗？

r - 如何计算R中一行中的唯一值

2 回答 2

Related

Reference