r - 如何简化具有数百万个数据点的格 xyplot？

Question

我以大约 500 Hz 的频率收集了多组时间历史数据，每次 12 小时。

我在对数时间尺度上使用xyplotwith绘制了这些数据type="l"，因为这种现象主要是对数衰减。

结果图是巨大的 pdf 文件，需要很长时间来渲染和放大我的编织文档的文件大小，因为我假设正在绘制每个单独的数据点，这完全是矫枉过正。这些图可以用更少的数量级合理地再现。

切换到type="smooth"修复渲染和文件大小问题，但黄土平滑会极大地改变线条的形状，即使在玩弄黄土平滑参数之后也是如此，所以我在这里放弃了黄土平滑作为选项。

有没有一种简单的方法可以对绘图进行后处理以简化它，或者在绘图之前对数据进行子采样？

如果对数据进行二次采样，我认为以一种逆对数方式这样做是有益的，其中接近零的数据具有较高的时间频率（使用源数据中的所有 500 Hz），但随着时间的推移数据的频率降低（即使 0.01 Hz 在 t=12 小时附近也绰绰有余）——这将在对数时间范围内提供或多或少相等的绘图分辨率。

score 1 · Accepted Answer

在尝试type="spline"并再次对它改变数据形状的程度感到不满之后，我决定采用二次抽样方法，在绘图之前降低数据密度。

我编写的函数将沿对数比例进行二次采样，因此“绘图分辨率”或多或少是恒定的。

## log.subsample(data,time,n.per.decade)

## subsamples a time-sampled data.frame so that there are no more than
## n.per.decade samples in each decade.

## usage
## data: data.frame, the data frame object, must contain a column with
##       times
##
## time: charater, the name of the data frame column with the time
##       values
## n.per.decade: the max number of rows per decade of time

## value
## returns a data.frame object with the same columns as data,
## subsampled such that there are no more than n.per.decade rows in
## each decade of time. Any rows in data with time < 0 are dropped.

log.subsample <- function(data,time,n.per.decade){
    time.col <- grep(x=colnames(data),pattern=time)
    min.time <- min(data[,time.col])
    if(min.time < 0){
        data <- data[data[,time.col]>0,]
        min.time <- min(data[,time.col])
        droplevels(data)
    }
    max.time <- max(data[,time.col])
    stopifnot(max.time > 0)
    min.decade <- floor(log10(min.time))
    max.decade <- ceiling(log10(max.time))

    time.seq <- seq(from=min.decade, to=max.decade, by=1/n.per.decade)
    time.seq <- 10^time.seq
    for(i in 1:length(time.seq)){
        tmp <- which(data[,time.col] >= time.seq[i])[1]
        if(!is.na(tmp)){
            if(!exists("indices.to.keep")){
                indices.to.keep <- tmp
            }
            else{
                indices.to.keep <- c(indices.to.keep,tmp)
            }
        }
    }
    indices.to.keep <- unique(indices.to.keep)
    result <- data[indices.to.keep,]
    result <- droplevels(result)
    return(result)
}

这里唯一的问题是，如果要绘制的数据中有任何“组”，则需要对每个组单独运行此子采样功能，然后需要建立一个数据框以传递给xyplot()

如果有人能告诉我是否有可能以xyplot()某种方式将这个子采样例程“注入”到调用中，那就太好了，这样就可以依次为每个单独的数据组调用它，从而无需分解数据，运行子采样例程，并在调用之前将数据重新组合在一起xyplot()

r - 如何简化具有数百万个数据点的格 xyplot？

1 回答 1

Related

Reference