我想在不使用 SQL 的情况下在 R 中解决这个问题。

如何在 SQL 中通过另一列选择具有 MAX(列值)、DISTINCT 的行?

当然,我可以使用 sqldf 来做到这一点,但是 R 中也必须有一个很酷的 apply 方法来做到这一点?


2 回答 2



Lines <- "id  home  datetime  player   resource
1   10   04/03/2009  john    399 
2   11   04/03/2009  juliet  244
5   12   04/03/2009  borat   555
3   10   03/03/2009  john    300
4   11   03/03/2009  juliet  200
6   12   03/03/2009  borat   500
7   13   24/12/2008  borat   600
8   13   01/01/2009  borat   700
DF <- read.table(text = Lines, header = TRUE)
DF$datetime <- as.Date(DF$datetime, format = "%d/%m/%Y")

1) base - by有很多方法可以使用各种包来处理它,但在这里我们将首先展示一个基本解决方案:

> do.call("rbind", by(DF, DF$home, function(x) x[which.max(x$datetime), ]))
   id home   datetime player resource
10  1   10 2009-03-04   john      399
11  2   11 2009-03-04 juliet      244
12  5   12 2009-03-04  borat      555
13  8   13 2009-01-01  borat      700

1a) 基础 - ave和一个变体(也只使用 R 的基础):

FUN <- function(x) which.max(x) == seq_along(x)
is.max <- ave(xtfrm(DF$datetime), DF$home, FUN = FUN) == 1
DF[is.max, ]

2) sqldf在这里它使用 sqldf 以防万一:

> library(sqldf)
> sqldf("select id, home, max(datetime) datetime, player, resource 
+        from DF 
+        group by home")
  id home   datetime player resource
1  1   10 2009-03-04   john      399
2  2   11 2009-03-04 juliet      244
3  5   12 2009-03-04  borat      555
4  8   13 2009-01-01  borat      700
于 2013-05-18T20:32:32.623 回答

I do not use SQL as well, so I would do it in this way.


df <- read.table("your file", "your options") # I leave this to you


row_with_max_value <- max(which(df$values & df$group_column=="desired_group"))

"row_with_max_value" contents the row number of your data frame (df), in which you find the maximum value of the column "values" (df$values) grouped by "group_column". If "group_column" is not of type character, remove the quotes and use the corresponding text format.

If you need the value, than


Probably it is not the most elegant way, but you do not need SQL and it works (at least for me ;)

于 2014-02-28T14:21:02.593 回答