r - r中的动态子集

Question

我有一个类似于以下的数据集，但有更多的列和行：

a<-c("Fred","John","Mindy","Mike","Sally","Fred","Alex","Sam")
b<-c("M","M","F","M","F","M","M","F")
c<-c(40,35,25,50,25,40,35,40)
d<-c(9,7,8,10,10,9,5,8)
df<-data.frame(a,b,c,d)
colnames(df)<-c("Name", "Gender", "Age", "Score")

我需要创建一个函数，让我对选定的数据子集的分数求和。但是，每次选择的子集可能有不同数量的变量。一个子集可能是Name=="Fred"，另一个可能是Gender == "M"& Age == 40。在我的实际数据集中，一个选定的子集中最多可以使用 20 列，因此我需要使其尽可能通用。

我尝试使用包含的 sapply 命令eval(parse(text=...)，但只需要 20,000 条左右的记录样本就需要很长时间。我确信有一种更快的方法，如果能找到它，我将不胜感激。

score 0 · Accepted Answer

lapply( subset( df, Gender == "M" & Age == 40, select=Score), sum)
#$Score
#[1] 18

我本可以只写：

sum( subset( df, Gender == "M" & Age == 40, select=Score) )

但这不能很好地概括。

score 0 · Accepted Answer

有几种方法可以表示这两个变量。一种方式是作为两个不同的对象，另一种方式是作为列表中的两个元素。

但是，使用 anamed list可能是最简单的：

# df is a function for the F distribution.  Avoid using "df" as a variable name
DF <- df

example1 <- list(Name = c("Fred"))  # c() not needed, used for emphasis
example2 <- list(Gender = c("M"), Age=c(40, 50))

## notice that the key portion is `DF[[nm]] %in% ll[[nm]]`

subByNmList <- function(ll, DF, colsToSum=c("Score")) {
    ret <- vector("list", length(ll))
    names(ret) <- names(ll)
    for (nm in names(ll))
        ret[[nm]] <- colSums(DF[DF[[nm]] %in% ll[[nm]] , colsToSum, drop=FALSE])

    # optional
    if (length(ret) == 1)
        return(unlist(ret, use.names=FALSE))

    return(ret)
   }

subByNmList(example1, DF)
subByNmList(example2, DF)

r - r中的动态子集

2 回答 2

Related

Reference