r - 在 tibble 中设置连续向量之间的差异（累积方式）

问问题 2019-06-21T00:22:37.500

46 次

我想找到一种更短的方法来计算 tibble 中的分组向量之间的连续差异（不是 B 中的 A），其中每个差异都在组“x”的向量与所有先前组中的向量的串联之间。

我在 for 循环中找到了使用 anti_join 的解决方案，但我想知道是否有更简洁的方法。

library(magrittr)
library(tidyverse)

tibble(stage = rep(1:4, each = 2),
       string = c("a", "b", "a", "c", "b", "d", "f", "e")) %>%
  (function(df) {
    df_filtered <- df %>%
      filter(stage == 1)

    for (stage_sub in unique(df$stage)) {
      df_filtered %<>%
        rbind(
          df %>%
            filter(stage == stage_sub) %>%
            anti_join(., df %>%
                        filter(stage %in% 1:(stage_sub-1)), by = "string")
        )
    }

    df_filtered
  })

换句话说，如果：

组 1：“a”、“b”

组 2：“a”、“c”

组 3：“b”、“d”

当我计算 group3 和 group1:2 之间的累积连续差异时，我应该得到：

组 3：“d”

因为“d”是唯一未包含在所有先前组中的元素。

r - 在 tibble 中设置连续向量之间的差异（累积方式）

0 回答 0

Related

Reference