2

谁能告诉我如何在 R 中完成以下工作?我想计算每个组中唯一人员的数量,如下图所示,第一列对应每个组(这里有 3 个组),第二列表示人的名字(例如,在第 1 组中,人 A 的name出现3次。第三列是我要在R中生成的那个(如果某人的名字在某个组中出现x次,那么最后一列应该表示x)。谢谢大家!

    x <- read.table(header=T, text="group peoplename noofuniquepeople
1 A 3
1 B 1
1 A 3
1 A 3
1 D 1
2 M 1
2 K 2
2 T 3
2 T 3
2 K 2
2 T 3
3 E 2
3 F 1
3 E 2
3 G 2
3 G 2
3 V 1")
4

4 回答 4

2

使用avewithin

within(x, Freq <- ave(1:nrow(x), peoplename, group, FUN=length))
于 2013-09-07T10:55:46.613 回答
1

Using good old base::aggregate this has the advantage (in my opinion) of aggregating your data to display one row for each group and peoplename within that group. length gives how many times that combination occurs:

aggregate( . ~ peoplename + group , data = x , FUN = length )
#   peoplename group noofuniquepeople
#1           A     1                3
#2           B     1                1
#3           D     1                1
#4           K     2                2
#5           M     2                1
#6           T     2                3
#7           E     3                2
#8           F     3                1
#9           G     3                2
#10          V     3                1

By the way, if you input data was missing the noofuniquepeople column (which I assume it is because you want to calculate it) you don't need it. You can use a dummy variable to aggregate on like this:

Unique = rep( 1 , nrow(x) )
aggregate( Unique ~ peoplename + group , data = x , FUN = sum )
于 2013-09-07T12:33:42.460 回答
1

理想情况下,您应该先放入您尝试过的内容。我们可以帮助您调试。

无论如何,

> df = data.frame(N = c("A","B","A","A","D","M","K","T","T","K","T","E","F","E","G","G","V"), G = c(3,1,3,3,1,1,2,3,3,2,3,2,1,2,2,2,1))
> df
   N G
1  A 3
2  B 1
3  A 3
4  A 3
5  D 1
6  M 1
7  K 2
8  T 3
9  T 3
10 K 2
11 T 3
12 E 2
13 F 1
14 E 2
15 G 2
16 G 2
17 V 1

> numberOfGroups = length(unique(df$G))
> numberOfGroups
[1] 3

> require(plyr)
> uniqueInGroup <- dlply(df,.fun=unique,.variables=.(G))
> uniqueInGroup
$`1`
  N G
1 B 1
2 D 1
3 M 1
4 F 1
5 V 1

$`2`
  N G
1 K 2
3 E 2
5 G 2

$`3`
  N G
1 A 3
4 T 3

attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
  G
1 1
2 2
3 3

lapply(uniqueInGroup, function(x) return(length(unique(x$N))))

哎呀,把第三个col分组了。改为使用第一个 col 运行此脚本,您将获得所需的输出。

于 2013-09-07T09:35:48.247 回答
1

可能有更好的方法,但是

x$gp     <- paste(x$group, x$peoplename)
x_new    <- merge (x, table(x$gp), by.x="gp", by.y="Var1")
x_new$gp <- NULL

生产

> x_new
   group peoplename noofuniquepeople Freq
1      1          A                3    3
2      1          A                3    3
3      1          A                3    3
4      1          B                1    1
5      1          D                1    1
6      2          K                2    2
7      2          K                2    2
8      2          M                1    1
9      2          T                3    3
10     2          T                3    3
11     2          T                3    3
12     3          E                2    2
13     3          E                2    2
14     3          F                1    1
15     3          G                2    2
16     3          G                2    2
17     3          V                1    1

最后两列是一样的

于 2013-09-07T09:35:55.017 回答