3

我是 R 的新手,并且在处理语法方面有很多时间。假设我有以下数据框数据:

value   label    second
1       a        q
2       a        q
3       a        ASDF
4       b        q
6       b        QWERTY
6       b        QWERTY
7       c        q
8       c        q
9       c        q
10      d        q

现在,我想得到一个向量,df$second它对应于df$value给定值的最大值df$label。例如,给定df$label = 'a',我想返回'ASDF'。因为df$label = 'b',我想回来'QWERTY', 'QWERTY'

这是我正在尝试的:

max_value <- max(data$value[data$label == 'a'])
    result <- c()
    for (x in data$value){
        if (x == max_value){
            result <- c(result, data$second)
        }
    }

现在这不会生成正确的结果向量。我想想办法用 sapply、tapply、mapply 等来做到这一点。我只是无法理解这些功能。任何帮助将不胜感激。

4

3 回答 3

3

在 data.table 中直截了当:

library(data.table)
DT <- data.table(df, key="label")
DT[.(lab)][value==max(value), second]

# where `lab` is whatever label value you are trying to find

请注意,如果您想对 的所有值执行此操作label,只需使用by参数:

DT[, c(.SD, mx=max(value)), by=label][value==mx, second, by=label]

   label second
1:     a   ASDF
2:     b QWERTY
3:     b QWERTY
4:     c      q
5:     d      q
于 2013-10-08T21:39:15.177 回答
2
lapply( split(dat, dat$label),
       function(df) df[df$value == max(df$value), "second"] )
$a
[1] ASDF
Levels: ASDF q QWERTY

$b
[1] QWERTY QWERTY
Levels: ASDF q QWERTY

$c
[1] q
Levels: ASDF q QWERTY

$d
[1] q
Levels: ASDF q QWERTY

如果您想摆脱因素包袱:

 lapply( split(dat, dat$label), 
    function(df) as.character(df[df$value == max(df$value), "second"]) )
$a
[1] "ASDF"

$b
[1] "QWERTY" "QWERTY"

$c
[1] "q"

$d
[1] "q"

要提取特定的叶子,请将结果设置为一个值并使用“[[”进行提取:

val <- lapply( split(dat, dat$label), 
    function(df) as.character(df[df$value == max(df$value), "second"]) )
val[["a"]]
#[1] "ASDF"
于 2013-10-08T21:40:46.033 回答
1

另一种baseR 函数:

df2 <- by(data = df, df$label, function(x) x[x$value == max(x$value), ])

# result as a list
df2
# df$label: a
# value label second
# 3     3     a   ASDF
# -------------------------------------------------------------------- 
#   df$label: b
# value label second
# 5     6     b QWERTY
# 6     6     b QWERTY
# -------------------------------------------------------------------- 
#   df$label: c
# value label second
# 9     9     c      q
# -------------------------------------------------------------------- 
#   df$label: d
# value label second
# 10    10     d      q

# ...or as a data frame
do.call(rbind, df2)
#     value label second
# a       3     a   ASDF
# b.5     6     b QWERTY
# b.6     6     b QWERTY
# c       9     c      q
# d      10     d      q
于 2013-10-08T21:51:31.460 回答