2

如果我有如下数据框:

v2 <- c(4.5, 2.5, 3.5, 5.5, 7.5, 6.5, 2.5, 1.5, 3.5) 
v1 <- c(2.2, 3.2, 1.2, 4.2, 2.2, 3.2, 2.2, 1.2, 5.2) 
lvl <- c("a","a","a","b","b","b","c","c","c") 
d <- data.frame(v1,v2,lvl) 

> d
   v1  v2 lvl
1 2.2 4.5   a
2 3.2 2.5   a
3 1.2 3.5   a
4 4.2 5.5   b
5 2.2 7.5   b
6 3.2 6.5   b
7 2.2 2.5   c
8 1.2 1.5   c
9 5.2 3.5   c

在 的每个级别中d$lvl,我想提取值为中d$v1位数的行(对于最简单的情况,每个级别d$lvl都有三行)。所以我想得到:

   v1  v2 l 
1 2.2 4.5 a 
6 3.2 6.5 b 
7 2.2 2.5 c 
4

4 回答 4

1

对于具有奇数行的组,此方法有效。您需要考虑如何处理具有偶数行的组。例如,您可能希望在一个或另一个方向上舍入中位数,请参阅?round

library(plyr)
d2 <- ddply(.data = d, .variables = .(lvl), function(x)
  x[which(x$v1 == median(x$v1)), ])

#    v1  v2 lvl
# 1 2.2 4.5   a
# 2 3.2 6.5   b
# 3 2.2 2.5   c
于 2013-09-17T08:56:07.570 回答
1

有几种方法可以做到这一点:

查看plyr包,它对于操作数据子集非常有用:

library(plyr)
ddply(d, .(lvl), summarize, v1 = median(v1), v2 = median(v2))

或者,如果您对SQL查询感到满意,您可以使用该sqldf包:

library(sqldf)
sqldf("SELECT median(v1) as v1, median(v2) as v2, lvl FROM d GROUP BY lvl")
于 2013-09-17T06:50:11.393 回答
0

我喜欢介绍一种处理奇数和偶数行的方法:

## example data
v2 <- c(4.5, 2.5, 3.5, 5.5, 7.5, 6.5, 2.5, 1.5, 3.5, 1, 1, 1, 1) 
v1 <- c(2.2, 3.2, 1.2, 4.2, 2.2, 3.2, 2.2, 1.2, 5.2, 1.5, 2.5, 3.5, 4.5) 
lvl <- c("a","a","a","b","b","b","c","c","c", "d", "d", "d", "d")
d <- data.frame(v1,v2,lvl)

## define own median index function
medIdx <- function(x) {
  n <- length(x)
  ## even: p == n/2
  ## odd:  p == (n+1)/2
  p <- ceiling(n/2)
  return(which(x == sort(x, partial=p)[p])[1])
}

## run blockwise (blocks defined by d$lvl) and bind results
do.call(rbind, by(d, INDICES=d$lvl, FUN=function(x){ return(x[medIdx(x$v1), ]) }))

#   v1  v2 lvl
#a 2.2 4.5   a
#b 3.2 6.5   b
#c 2.2 2.5   c
#d 2.5 1.0   d
于 2013-09-17T09:55:25.887 回答
0

首先,用函数ddply通过lvl计算v1的中位数(用1位小数四舍五入)

(install.packages("plyr")
 df <- ddply(d, .(lvl), summarize, v1 = round(median(v1),1))

其次,将原始df(d)与计算出的(df)合并,合并比较原始数据(d)中lvl和v1相同的地方,只取那些行

 df1 <- merge(df, d, by = c("lvl","v1"))

View(df1)
  lvl  v1  v2
1   a 2.2 4.5
2   b 3.2 6.5
3   c 2.2 2.5
于 2013-09-17T09:01:40.163 回答