1

这个问题建立在这个问题之上:

R:按周计算移动最大斜率,考虑因素

我的问题:

下面粘贴的代码使用 计算 7 天内的最大斜率length(HDD)。我想更有区别,我只希望连续 7 天计算 MaxSlope。

例如,从 2004 年 12 月 26 日到 2004 年 12 月 30 日的数据中存在差距。仅考虑我在此处复制的这部分数据,MaxSlope 应仅针对 2004-12-23 和 2004-12-24 计算。所有其他日期都应插入“NA”。该数据集将增长到数百万条记录,因此效率很重要。

注意:我将我的 data.frame 子集化以仅提供此处重要的列。MaxSlope 代码中的by语句很重要,因为它适用于整个 data.frame。

我不知道从哪里开始连续日期计算。有任何想法吗?

谢谢!

我用来计算最大斜率的代码:

RawByDayALL <- data.table(RawByDayALL)
RawByDayALL[, MaxSlope := if(length(HDD)<7) {rep(NA_real_, length(HDD))} else {filter(HDD, c(1,1,1,1,1,1,0)/7)}, by=list(WinterID, SiteID, SubstrateConcat)]
RawByDayALL[is.na(MaxSlope), MaxSlope := -99L]

我的数据结构:

> dput(RawByDayALL[650:660])
structure(list(WinterID = structure(c(6L, 6L, 6L, 6L, 6L, 6L, 
6L, 6L, 6L, 6L, 6L), .Label = c("2002", "2002_2003", "2003", 
"2003_2004", "2004", "2004_2005", "2005", "2005_2006", "2006", 
"2006_2007", "2007", "2007_2008", "2008"), class = "factor"), 
    Date = structure(c(12771, 12772, 12773, 12774, 12775, 12776, 
    12777, 12778, 12782, 12783, 12784), class = "Date"), SiteID = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = "NW_SB", class = "factor"), 
    SubstrateConcat = structure(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    2L, 2L, 2L, 2L), .Label = c("B_A", "B_B"), class = "factor"), 
    HDD = c(17.3533333333333, 35.1066666666667, 82.6266666666667, 
    51.68, 36.22, 39.6066666666667, 38.0533333333333, 47.8333333333333, 
    4.18, 9.66, 1.5), MaxSlope = c(30.4104761904762, 33.3885714285714, 
    37.5133333333333, 40.4704761904762, 42.2885714285714, 31.0819047619048, 
    25.0790476190476, 20.1190476190476, 14.6019047619048, 9.19428571428571, 
    2.6552380952381)), .Names = c("WinterID", "Date", "SiteID", 
"SubstrateConcat", "HDD", "MaxSlope"), class = c("data.table", 
"data.frame"), row.names = c(NA, -11L), .internal.selfref = <pointer: 0x0000000000100788>)

数据的一部分是什么样的:

    WinterID        Date    SiteID  SubstrateConcat   HDD           MaxSlope  
650 2004_2005   2004-12-19  NW_SB   B_B               17.35333333   30.41047619
651 2004_2005   2004-12-20  NW_SB   B_B               35.10666667   33.38857143
652 2004_2005   2004-12-21  NW_SB   B_B               82.62666667   37.51333333
653 2004_2005   2004-12-22  NW_SB   B_B               51.68000000   40.47047619
654 2004_2005   2004-12-23  NW_SB   B_B               36.22000000   42.28857143
655 2004_2005   2004-12-24  NW_SB   B_B               39.60666667   31.08190476
656 2004_2005   2004-12-25  NW_SB   B_B               38.05333333   25.07904762
657 2004_2005   2004-12-26  NW_SB   B_B               47.83333333   20.11904762
658 2004_2005   2004-12-30  NW_SB   B_B               4.18000000    14.60190476
659 2004_2005   2004-12-31  NW_SB   B_B               9.66000000    9.19428571
660 2004_2005   2005-01-01  NW_SB   B_B               1.50000000    2.65523810

已编辑以包括@eddi 提供的答案。感谢您的简单修复!

    RawByDayALL <- data.table(RawByDayALL)
    RawByDayALL[, MaxSlope := if(length(HDD)<7) {rep(NA_real_, length(HDD))} else {filter(HDD, c(1,1,1,1,1,1,0)/7)}, by=list(WinterID, SiteID, SubstrateConcat, cumsum(diff(c(Date[1], as.IDate(Date))) > 1))]
    RawByDayALL[is.na(MaxSlope), MaxSlope := -99L]
4

1 回答 1

4

这将为您提供所需的连续日期分组:

dt[, cumsum(diff(c(Date[1], as.IDate(Date))) > 1)]

by除了其他列之外,这就是您如何将其放入您的列中的方式:

dt[, your_calculation,
     by = list(various_columns, cumsum(diff(c(Date[1], as.IDate(Date))) > 1))]
于 2013-09-20T16:12:36.137 回答