r - 将列添加到显示变量频率的数据框中

Question

在 R 中，让我感到困惑的总是小事。

假设我有一个这样的数据框：

  location   species
1  seattle   A
2  buffalo   C
3  seattle   D
4  newark    J
5  boston    Q

我想在此框架中附加一列，显示位置在数据集中出现的次数，结果如下：

  location   species    freq-loc
1  seattle   A          2           #there are 2 entries with location=seattle
2  buffalo   C          1           #there is 1 entry with location=buffalo
3  seattle   D          2
4  newark    J          1
5  boston    Q          1

我知道使用table(data$location)可以给我一个列联表。但我不知道如何将表中的每个值映射到数据框中的相应条目。有人可以帮忙吗？

更新

非常感谢您的帮助！出于兴趣，我运行了一个基准测试，以查看合并、plyr 和 ave 解决方案的运行情况。测试集是我原来的 10 x 7mil 数据集的 10,000 行子集。：

Unit: milliseconds
expr        min         lq     median        uq       max neval
MERGE 110.877337 111.989406 112.585420 113.51679 120.23588   100
PLYR  26.305645  27.080403  27.576580  27.87157  68.40763   100
AVE   2.994528   3.117255   3.179898   3.35834  10.02955   100

score 9 · Accepted Answer

9

这是带有ave.

transform(d, freq.loc = ave(seq(nrow(d)), location, FUN=length))

于 2013-06-10T18:31:38.360 回答

score 6 · Accepted Answer

我敢肯定有人会很快发布（丑陋的;））ave或解决方案，但这是一个：plyrdata.table

library(data.table)
dt = data.table(your_df)

dt[, `freq-loc` := .N, by = location]
# note: using `-quotes around your var name, because of the "-" in the name

score 2 · Accepted Answer

尝试在列名中使用破折号会非常痛苦。最好使用下划线或“点”。

dfrm$freq_loc <- ave( as.numeric(dat[[1]]), dat[["location"]] ,
                                                     FUN=length)

我尝试在ave没有as.numeric第一列的情况下使用，但令我惊讶的是，得到了与因子水平相关的神秘错误消息。

score 2 · Accepted Answer

合并：

merge(data, data.frame(table(location = data$location)), by = c("location"))
# location species Freq
# 1   boston       Q    1
# 2  buffalo       C    1
# 3   newark       J    1
# 4  seattle       A    2
# 5  seattle       D    2

另外，我听到了一个要求plyr：

library(plyr)
join(data, data.frame(table(location = data$location)))
# Joining by: location
# location species Freq
# 1  seattle       A    2
# 2  buffalo       C    1
# 3  seattle       D    2
# 4   newark       J    1
# 5   boston       Q    1

r - 将列添加到显示变量频率的数据框中

4 回答 4

Related

Reference