2

我有来自一堆服务器的 30 秒粒度数据。我想将此数据滚动到每台服务器的 15 分钟。

我的数据框是这样的:

输出(p)

structure(list(DATE = c("2013-04-15   02:47:32", "2013-04-15   02:48:02", 
"2013-04-15   02:48:32", "2013-04-15   02:49:02", "2013-04-15   02:49:32", 
"2013-04-15   02:50:02", "2013-04-15   02:50:32", "2013-04-15   02:51:02", 
"2013-04-15   02:51:32", "2013-04-15   02:52:02", "2013-04-15   02:52:32", 
"2013-04-15   02:53:02", "2013-04-15   02:53:32", "2013-04-15   02:54:02", 
"2013-04-15   02:54:32", "2013-04-15   02:55:02", "2013-04-15   02:55:32", 
"2013-04-15   02:56:02", "2013-04-15   02:56:32", "2013-04-15   02:57:02", 
"2013-04-29   17:33:07", "2013-04-29   17:33:37", "2013-04-29   17:34:07", 
"2013-04-29   17:34:37", "2013-04-29   17:35:07", "2013-04-29   17:35:37", 
"2013-04-29   17:36:07", "2013-04-29   17:36:37", "2013-04-29   17:37:07", 
"2013-04-29   17:37:37", "2013-04-29   17:38:07", "2013-04-29   17:38:37", 
"2013-04-29   17:39:07", "2013-04-29   17:39:37", "2013-04-29   17:40:07", 
"2013-04-29   17:40:37", "2013-04-29   17:41:07", "2013-04-29   17:41:37", 
"2013-04-29   17:42:07", "2013-04-29   17:42:37"), Server = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 2L, 2L, 2L), .Label = c("ServerA", "ServerB"), class = "factor"), 
    CPU = c(70L, 71L, 72L, 72L, 72L, 73L, 73L, 74L, 73L, 73L, 
    73L, 73L, 71L, 74L, 72L, 72L, 70L, 72L, 71L, 70L, 78L, 79L, 
    79L, 78L, 79L, 77L, 78L, 80L, 81L, 80L, 80L, 79L, 79L, 79L, 
    81L, 79L, 78L, 79L, 79L, 79L)), .Names = c("DATE", "Server", 
"CPU"), class = "data.frame", row.names = c(NA, -40L))

有没有一种简单的方法可以将每台服务器的 30 秒数据滚动到 15 分钟数据?我可以在此数据框中拥有 2 台以上的服务器。

例如,如果我的数据如下,其中包括 30 秒的数据。我需要平均每 15 分钟的 CPU 数据。

      DATE       SERVER CPU
1 2013-04-15 02:47:32 ServerA 70
2 2013-04-15 02:48:02 ServerA 71
3 2013-04-15 02:48:32 ServerA 72
4 2013-04-15 02:49:02 ServerA 72
5 2013-04-15 02:49:32 ServerA 72
6 2013-04-15 02:50:02 ServerA 73
   :
   :
   :
   :
4

3 回答 3

3

首先,将您的 sring 转换为POSIXct类:

as.POSIXct(strptime("2013-04-15 02:47:32", "%Y-%m-%d %H:%M:%S"))

接下来,取消分类以获取纪元(自 1970-01-01 以来的秒数):

unclass(as.POSIXct(strptime("2013-04-15 02:47:32", "%Y-%m-%d %H:%M:%S")))

最后,截断超过最后 15 分钟间隔(15*60 秒)的秒数:

floor(unclass(as.POSIXct(strptime("2013-04-15 02:47:32", 
                                  "%Y-%m-%d %H:%M:%S"))
             ) / (15*60)
     ) * (15*60)

全部放在数据框上:

as.POSIXct(floor(unclass(as.POSIXct(strptime("2013-04-15   02:47:32", "%Y-%m-%d %H:%M:%S")))/(15*60))*(15*60), origin='1970-01-01 00:00.00 UTC')
于 2013-04-30T23:09:00.423 回答
0

我会做什么:

正如 topchef 建议的那样,使用 POSIXct,而不是字符串。因此,一旦我将数据存储在L您的数据中,我的结构就会像您所拥有的那样,但不是您的 DATE 列,而是按照 topchef 的建议获得的 ts,

L$ts <- as.POSIXct(L$DATE)

您想要聚合值,所以在我看来,将聚合键添加到数据中似乎很自然。

baseSecond <- function(x, seconds) { 
  as.POSIXct(floor(unclass(x) / seconds) * seconds,
             origin='1970-01-01 00:00.00 UTC')
}

L$base <- baseSecond(L$ts, 15*60)

为了完成任务,我会使用aggregate标准功能。

aggregate(L$Server, by=list(L$base), function(x) x[1])

第三个参数允许您选择聚合数据的方式。

于 2013-05-01T10:59:27.860 回答
0

我想出了一个这样的解决方案,可能会有更好更快的解决方案,但现在可行:

apply.periodly <- function (x, FUN, period, k=1, ...) 
{
  if (!require("xts")) {
    stop("Need 'xts'")
  }
  ep <- endpoints(x, on=period, k=k)
  period.apply(x, ep, FUN, ...)
}

total_df <- data.frame(DATE=as.POSIXct(character()), CPU=as.numeric(character()),  SERVER=character())


for(i in 1:length(servers)) {

    y<-subset(x, SERVER= c(servers[i]))
    mydata.xts <- xts(y$CPU, order.by = y$DATE)
    mydata.15M <- apply.periodly(x = mydata.xts, FUN = mean, period = "minutes", k = 15)

    new_df<-data.frame(date=index(mydata.15M), coredata(mydata.15M))
    colnames(new_df)<-c("DATE", "CPU")
    new_df$SERVER<-as.character(servers[i])

    total_df<-rbind(total_df, new_df)    

}

于 2013-05-01T15:57:18.340 回答