我有一个很长的数据框,其中包含来自桅杆的气象数据。它包含data$value
不同参数(风速、风向、气温等)data$param
在不同高度(data$z
我试图通过 有效地分割这些数据$time
,然后将函数应用于收集的所有数据。通常函数一次应用于$param
一个(即我对风速应用不同的函数,而不是对空气温度应用不同的函数)。
目前的方法
我目前的方法是使用data.frame
and ddply
。
如果我想获得所有的风速数据,我运行这个:
# find good data ----
df <- data[((data$param == "wind speed") &
!is.na(data$value)),]
df
然后我在using上运行我的函数ddply()
:
df.tav <- ddply(df,
.(time),
function(x) {
y <-data.frame(V1 = sum(x$value) + sum(x$z),
V2 = sum(x$value) / sum(x$z))
return(y)
})
通常 V1 和 V2 是对其他函数的调用。这些只是例子。我确实需要在相同的数据上运行多个函数。
问题
我目前的方法很慢。我没有对它进行基准测试,但是它足够慢,我可以去喝杯咖啡,然后在处理一年的数据之前回来。
我有订单(一百个)要处理的塔,每个都有一年的数据和 10-12 个高度,所以我正在寻找更快的东西。
数据样本
data <- structure(list(time = structure(c(1262304600, 1262304600, 1262304600,
1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600,
1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600,
1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600,
1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600,
1262304600, 1262304600, 1262304600, 1262304600, 1262304600, 1262304600,
1262305200, 1262305200, 1262305200, 1262305200, 1262305200, 1262305200,
1262305200), class = c("POSIXct", "POSIXt"), tzone = ""), z = c(0,
0, 0, 100, 100, 100, 120, 120, 120, 140, 140, 140, 160, 160,
160, 180, 180, 180, 200, 200, 200, 40, 40, 40, 50, 50, 50, 60,
60, 60, 80, 80, 80, 0, 0, 0, 100, 100, 100, 120), param = c("temperature",
"humidity", "barometric pressure", "wind direction", "turbulence",
"wind speed", "wind direction", "turbulence", "wind speed", "wind direction",
"turbulence", "wind speed", "wind direction", "turbulence", "wind speed",
"wind direction", "turbulence", "wind speed", "wind direction",
"turbulence", "wind speed", "wind direction", "turbulence", "wind speed",
"wind direction", "turbulence", "wind speed", "wind direction",
"turbulence", "wind speed", "wind direction", "turbulence", "wind speed",
"temperature", "barometric pressure", "humidity", "wind direction",
"wind speed", "turbulence", "wind direction"), value = c(-2.5,
41, 816.9, 248.4, 0.11, 4.63, 249.8, 0.28, 4.37, 255.5, 0.32,
4.35, 252.4, 0.77, 5.08, 248.4, 0.65, 3.88, 313, 0.94, 6.35,
250.9, 0.1, 4.75, 253.3, 0.11, 4.68, 255.8, 0.1, 4.78, 254.9,
0.11, 4.7, -3.3, 816.9, 42, 253.2, 2.18, 0.27, 229.5)), .Names = c("time",
"z", "param", "value"), row.names = c(NA, 40L), class = "data.frame")