r - 在 R 中计算变量（列）中的唯一值

Question

我有一个大型数据集，在 5 个时间段内重复测量。

   2012  2009  2006  2003  2000
    3     1     4     4     1
    5     3     2     2     3
    6     7     3     5     6

我想添加一个新列，它是 2000 年到 2012 年之间唯一值的数量。例如，

   2012  2009  2006  2003  2000  nunique
    3     1     4     4     1      3
    5     3     2     2     3      3
    6     7     3     5     6      4

我在 R 中工作，如果有帮助，每个时间段的测量值只有 14 个可能的不同值。

我找到了这个页面：在 R 中的一组变量中计数值的出现次数（每行）并尝试了它提供的各种解决方案。然而，它给我的是每个值的计数，而不是唯一值的数量。这里的其他类似问题似乎询问关于计算变量/列中唯一值的数量，而不是每行。任何建议，将不胜感激。

score 2 · Accepted Answer

这是另一种选择

> df$nunique <- apply(df, 1, function(x) length(unique(x)))
> df
  2012 2009 2006 2003 2000 nunique
1    3    1    4    4    1       3
2    5    3    2    2    3       3
3    6    7    3    5    6       4

score 1 · Accepted Answer

如果您有一个大型数据集，您可能希望避免循环遍历行，但使用更快的框架，例如 S4Vectors：

df <- data.frame('2012'=c(3,5,6),
             '2009'=c(1,3,7),
             '2006'=c(4,2,3),
             '2003'=c(4,2,5),
             '2000'=c(1,3,6))

dup <- S4Vectors:::duplicatedIntegerPairs(as.integer(as.matrix(df)), row(df))
dim(dup) <- dim(df)
rowSums(!dup)

或者，matrixStats 包：

m <- as.matrix(df)
mode(m) <- "integer"
rowSums(matrixStats::rowTabulates(m) > 0)

score 0 · Accepted Answer

诀窍是使用“应用”并将每一行分配给一个变量（例如x）。然后，您可以编写一个自定义函数，在这种情况下，它使用“唯一”和“长度”来获得您想要的答案。

df <- data.frame('2012'=c(3,5,6), '2009'=c(1,3,7), '2006'=c(4,2,3), '2003'=c(4,2,5), '2000'=c(1,3,6))

df$nunique = apply(df, 1, function(x) {length(unique(x))})

score 0 · Accepted Answer

0

试试这个：

sapply(data, function(x) length(unique(x)))

于 2018-05-15T04:04:49.563 回答

r - 在 R 中计算变量（列）中的唯一值

4 回答 4

Related

Reference