5

我有一个数据框,其中每一行代表一个记录的事件。举个例子,假设我测量了经过的汽车的速度,有些汽车不止一次地经过我。

cardata <- data.frame(
  car.ID = c(3,4,1,2,5,4,5),
  speed = c(100,121,56,73,87,111,107)
  )

我可以对列表进行排序并找出三个最快的事件...

top3<-head(cardata[order(cardata$speed,decreasing=TRUE),],n=3)
> top3
  car.ID speed
2      4   121
6      4   111
7      5   107

...但您会注意到,汽车 4 记录了两个最快的时间。如何在没有任何重复汽车 ID 的情况下找到三个最快的事件?我意识到,在这种情况下,可能“前 3 名”列表将不包括三个最快的事件。

4

5 回答 5

6

You can use aggregate to first find the top speed per car.ID:

cartop <- aggregate(speed ~ car.ID, data = cardata, FUN = max)
top3 <- head(cartop[order(cartop$speed, decreasing = TRUE), ], n = 3)

 #   car.ID speed
 # 4      4   121
 # 5      5   107
 # 3      3   100
于 2013-11-05T20:26:35.763 回答
3

使用data.table而不是data.frame

library(data.table)
dt = data.table(cardata)

# the easier to read way
dt[order(-speed), speed[1], by = car.ID][1:3]
#   car.ID  V1
#1:      4 121
#2:      5 107
#3:      3 100

# (probably) a faster way
setkey(dt, speed) # faster sort by speed
tail(dt[, speed[.N], by = car.ID], 3)
#  car.ID  V1
#1:      5 107
#2:      3 100
#3:      4 121

# and another way for fun (not sure how fast it is)
setkey(dt, car.ID, speed)
tail(dt[J(unique(car.ID)), mult = 'last'], 3)
于 2013-11-05T20:33:17.823 回答
3

有了plyr你也可以做到。以选择前 3 名为例:

library(plyr)
top3 <- ddply(ddply(cardata,.(car.ID),summarize, maxspeed=max(speed)),.(-maxspeed))[1:3,-1]

更新

使用该dplyr软件包,您可以更快、更清晰地完成它。

require(dplyr)

# Select for each car.ID the observation with the highest speed and sort.
top <- cardata  %>% 
    group_by(car.ID) %>% 
    arrange(-speed)%>%
    top_n(1)

# Take the top 3 of the resulting table.
top3 <- top[1:3,]
top3

#   car.ID speed
# 1      4   121
# 2      5   107
# 3      3   100
于 2013-11-05T20:39:17.453 回答
2

我更喜欢使用 base R 建议的解决方案,但为了完整起见,这里是另一种使用方式sqldf

library(sqldf)

cardata <- data.frame(
  car.ID = c(3,4,1,2,5,4,5),
  speed = c(100,121,56,73,87,111,107)
)

sqldf("
select car_ID, max(speed) as max_speed
from cardata
group by car_ID
order by max(speed) desc      
limit 3
      ")
于 2013-11-05T20:58:02.983 回答
2

这是另一种基本 R 方式:

top.speeds <- unique(transform(cardata, speed=ave(speed, car.ID, FUN=max)))
top3 <- head(top.speeds[order(top.speeds$speed, decreasing=TRUE), ], n=3)
#   car.ID speed
# 2      4   121
# 5      5   107
# 1      3   100
于 2013-11-05T20:43:06.467 回答