-2

我在 R 中有以下面板数据集,其中包含一个 ID 变量并显示该 ID 的最后登录详细信息。

id name address last_log_june1 last_log_june2 last_log_june3 last_log_june4 last_log_june"n"
1    A           2020-06-01     2020-06-01    2020-06-03
2    B           2020-06-01      2020-06-01   2020-06-01
3    C           2020-06-01     2020-06-02    2020-06-03

在上面的数据集中,我想计算 A、B 和 C 登录的唯一次数。我如何在 R 中做到这一点,以便我只选择“last_log_date”变量并让 R 计算其中的唯一日期? 我还想将此计数列添加到数据集中。

期待解决这个问题!

谢谢,拉奇塔

4

2 回答 2

0

(版本 1.0.0)包中有一些功能dplyr可能会有所帮助。

假设您的数据以、、df列和一系列以 开头的列调用,并且这些列中可能存在一些值。IDnameaddresslast_log_juneNA

new_df <- df %>% rowwise() %>% ## indicate you want to apply functions on rows
  mutate(na_exists = ifelse(sum(is.na(c_across(starts_with("last_log_june"))))>0,1,0), 
         ## an intermediate variable na_exists to indicate whether or not there is `NA` in any of the columns
         unique_with_NA = length(unique(c_across(starts_with("last_log_june")),na.rm=T))
         ## if there is NA, the unique function will also count `NA` as a unique value
         unique_withno_NA = unique_with_NA-na_exists
         ## if you don't want NA counted as an unique value, then the final result should exclude it
) %>% select (-na_exists, -unique_with_NA)
      ## remove the intermediate variables


使用函数c_across(starts_with("last_log_june"))只会考虑以last_log_june

于 2020-06-19T15:32:38.757 回答
0

您需要该unique功能并将其应用于行。

df <- data.frame(id = 1:3, name = LETTERS[1:3], 
                 last_log_june1 = c("2020-06-01", "2020-06-01", "2020-06-01"), 
                 last_log_june2 = c("2020-06-01", "2020-06-01", "2020-06-02"),  
                 last_log_june3 = c("2020-06-01", "2020-06-02", "2020-06-03"), 
                 stringsAsFactors = FALSE)

n = 3 # number of "last_log_june" columns
result <- apply(df[, paste0("last_log_june", 1:n)], 1, function(x) unique(unlist(x)))
sapply(result, length) # shows a vector with the number of unique values
df$count <- sapply(result, length) # new column

那是你需要的吗?

于 2020-06-19T14:35:40.163 回答