1

我想会有一个非常简单的答案。但是这里。

长格式数据。像这样

d <- data.frame(cbind(numbers = rnorm(10), 
                         year = rep(c(2008, 2009), 5), 
                         name = c("john", "David", "Tom", "Kristin", "Lisa","Eve","David","Tom","Kristin","Lisa")))

如何获得仅包含 2008 年和 2009 年出现的名称行的新数据框?(即只有大卫、克里斯汀、丽莎和汤姆)。

提前致谢

4

4 回答 4

11

简单的方法:

subset(
    d,
    name %in% intersect(name[year==2008], name[year==2009])
)
于 2009-09-06T17:43:09.400 回答
3

一种方法是使用 reshape 包创建一个 data.frame ,其中列中的年份和行中的名称:

library(reshape)
cast(d, name ~ year, value = "numbers")

然后,您可以使用complete.cases来提取感兴趣的行。

于 2009-09-06T15:28:50.357 回答
2

如果每年只有一条记录,只需计算每个人在数据集中出现的次数即可:

counts <- as.data.frame(table(name = d$name))

然后寻找出现两次的每个人:

subset(counts, Freq == 2)
于 2009-09-06T15:31:17.013 回答
1

这是另一种仅使用基数 R 并且不对一个人每年拥有的记录数量做出任何假设的解决方案:

d <- data.frame(cbind(numbers = rnorm(10), 
                      year = rep(c(2008, 2009), 5),
                      name = c("john", "David", "Tom", "Kristin",
                               "Lisa","Eve","David","Tom","Kristin",
                               "Lisa")))
# split data into 2 data.frames (1 for each year)
by.year <- split(d, d$year, drop=T)

# find the names that appear in both years
keep <- intersect(by.year[['2008']]$name, by.year[['2009']]$name)
# Or, if you had several years, use Reduce as a more general solution:
keep <- Reduce(intersect, lapply(by.year, '[[', 'name'))

# show the rows of the original dataset only if their $name field
# is in our 'keep' vector
d[d$name %in% keep,]
于 2009-09-06T16:26:30.150 回答