更新的答案
根据 OP 对请求的更新,我修改了代码以聚合每周定义的日期(星期六)的数据。这次我只使用基础 R 中可用的函数。它忽略了 NA(如果给定的 End_of_Week-Climate_Division 只有 NA,你会得到 NaN,而不是数字)。
# Data with another Climate division as example (same daily values and dates)
CADaily <-
structure(list(Climate_Division = c(1, 1, 1, 1, 1, 1, 1, 1, 1,
1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Date = structure(c(1L, 2L,
3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 1L, 2L, 3L, 4L, 5L, 6L, 7L,
8L, 9L, 10L), .Label = c("01/07/1948", "02/07/1948", "03/07/1948",
"04/07/1948", "05/07/1948", "06/07/1948", "17/07/1948", "18/07/1948",
"20/07/1948", "22/07/1948"), class = "factor"), Rain = c(0.875,
2.9166667, 0.7916667, 0.4305556, 0.8262061, 0.5972222, 0.04166667,
0.08333333, 0.04166667, 0.125, 0.875, 2.9166667, 0.7916667, 0.4305556,
0.8262061, 0.5972222, 0.04166667, 0.08333333, 0.04166667, 0.125
), week = c(27, 27, 27, 27, 27, 27, 29, 29, 29, 30, 27, 27, 27,
27, 27, 27, 29, 29, 29, 30)), .Names = c("Climate_Division",
"Date", "Rain", "week"), row.names = c(NA, 20L), class = "data.frame")
# Coerce to Date class
CADaily$Date <- as.Date(x=CADaily$Date, format='%d/%m/%Y')
# Extract day of the week (Saturday = 6)
CADaily$Week_Day <- as.numeric(format(CADaily$Date, format='%w'))
# Adjust end-of-week date (first saturday from the original Date)
CADaily$End_of_Week <- CADaily$Date + (6 - CADaily$Week_Day)
# Aggregate over week and climate division
aggregate(Rain~End_of_Week+Climate_Division, FUN=mean, data=CADaily, na.rm=TRUE)
# Output
# End_of_Week Climate_Division Rain
# 1 1948-07-03 1 1.52777780
# 2 1948-07-10 1 0.61799463
# 3 1948-07-17 1 0.04166667
# 4 1948-07-24 1 0.08333333
# 5 1948-07-03 2 1.52777780
# 6 1948-07-10 2 0.61799463
# 7 1948-07-17 2 0.04166667
# 8 1948-07-24 2 0.08333333
附加操作
此外,使用此代码,您可以从其他聚合函数获取结果,假设结果是每个周除对的相同长度的原子向量。
# Aggregate over week and climate division, and show the total number of
# observations per week, the number of observations which represent missing
# values, the average, and the standard deviation.
aggregate(Rain~End_of_Week+Climate_Division, data=CADaily,
FUN=function(x) c(n=length(x),
NAs=sum(is.na(x)),
Average=mean(x, na.rm=TRUE),
SD=sd(x, na.rm=TRUE)))
# Output. You get NA for the standard deviation if there is only one observation.
# End_of_Week Climate_Division Rain.n Rain.NAs Rain.Average Rain.SD
# 1 1948-07-03 1 3.00000000 0.00000000 1.52777780 1.20353454
# 2 1948-07-10 1 3.00000000 0.00000000 0.61799463 0.19864151
# 3 1948-07-17 1 1.00000000 0.00000000 0.04166667 NA
# 4 1948-07-24 1 3.00000000 0.00000000 0.08333333 0.04166667
# 5 1948-07-03 2 3.00000000 0.00000000 1.52777780 1.20353454
# 6 1948-07-10 2 3.00000000 0.00000000 0.61799463 0.19864151
# 7 1948-07-17 2 1.00000000 0.00000000 0.04166667 NA
# 8 1948-07-24 2 3.00000000 0.00000000 0.08333333 0.04166667
原始答案
试试这个lubridate
包。加载它,然后聚合(作为原始答案的一部分保留记录,这反映了 OP 按周聚合的请求)。
# Load lubridate package
library(package=lubridate)
# Set Weeks number. Date already of class `Date`
CADaily$Week <- week(CADaily$Date)
# Aggregate over week number and climate division
aggregate(Rain~Week+Climate_Division, FUN=mean, data=CADaily, na.rm=TRUE)
# Output
# Week Climate_Division Rain
# 1 27 1 1.07288622
# 2 29 1 0.05555556
# 3 30 1 0.12500000
# 4 27 2 1.07288622
# 5 29 2 0.05555556
# 6 30 2 0.12500000