1

我有一个看起来像这样的数据框:

set.seed(42)
data <- runif(1000)    
utility <- sample(c("abc","bcd","cde","def"),1000,replace=TRUE)
stage <- sample(c("vwx","wxy","xyz"),1000,replace=TRUE)
x <- data.frame(data,utility,stage)
head(x)
   data utility stage
1 0.9148060     def   xyz
2 0.9370754     abc   wxy
3 0.2861395     def   xyz
4 0.8304476     cde   xyz
5 0.6417455     bcd   xyz
6 0.5190959     abc   xyz

我想为效用和阶段的独特组合生成累积分布函数。在我的实际应用程序中,我最终会生成大约 100 个 cdfs,但是这个随机数据将有 12 个(4x3)独特的组合。但是我将使用这些 cdf 中的每一个数千次,所以我不想每次都即时计算 cdf。ecdf() 函数完全按照我的意愿工作,除了我需要对其进行矢量化。以下代码不起作用,但这是我正在尝试做的事情的要点:

ecdf_multiple <- function(x)
{
    i=0
    utilities <- levels(x$utilities)
    stages <- levels(x$stages)
    for(utility in utilities)
    {
        for(stage in stages)
        {
            i <- i + 1
            y <- ecdf(x[x$utilities == utility & x$stage == stage,1])
            # calculate ecdf for the unique util/stage combo
            z[i] <- list(y,utility,stage)
            # then assign it to a data element (list, data frame, json, whatever) note-this doesn't actually work
        }
    }
    z # return value
}

所以在运行 ecdf_multiple 并将其分配给一个变量之后,我会通过传递一个值(我想要 cdf)、实用程序和舞台来以某种方式引用该变量。

有没有办法对 ecdf 函数进行矢量化(或使用/构建另一个函数),以便我可以多次输出而不需要一遍又一遍地生成分布?

--------添加以回应@Pascal 的优秀建议。--------

如何将其扩展为采用“n”维类别的更一般情况?这是我的刺,基于帕斯卡的二维案例。注意我是如何尝试分配“y”的:

set.seed(42)
data <- runif(1000)    
utility <- sample(c("abc","bcd","cde","def"),1000,replace=TRUE)
stage <- sample(c("vwx","wxy","xyz"),1000,replace=TRUE)
openclose <- sample(c("open","close"),1000,replace=TRUE)
x <- data.frame(data,utility,stage,openclose)
numlabels <- length(names(x))-1
y <- split(x, list(x[,2:(numlabels+1)]))
l <- lapply(y,function(x) ecdf(x[,"data"]))

#execute
utility <- "abc"
stage <- "xyz"
openclose <- "close"
comb <- paste(utility, stage, openclose, sep = ".")
# call the function
l[[comb]](.25)

在上面的“y”分配期间,我收到以下错误消息:

"Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?"
4

1 回答 1

1

以下可能会有所帮助:

# we create a list of criteria by excluding 
# the first column of the data.frame
y <- split(x, as.list(x[,-1]))
l <- lapply(y, function(x) ecdf(x[,"data"]))

utility <- "abc"
stage <- "xyz"
comb <- paste(utility, stage, sep = ".")    

l[[comb]](0.25)
# [1] 0.2613636
plot(l[[comb]])

在此处输入图像描述

于 2015-10-28T05:12:06.220 回答