我有一个问题,我目前在 Deedle 中解决的有点快和肮脏,但我显然没有以最有效的方式解决它。
输入数据由一些转储组成,这些转储在 format 上具有非常频繁的数据<DateTime,double>
,采样间隔有些随机(在约 20 毫秒到 10 分钟之间变化)。
我想计算一个新的时间序列,其中一个或多个这些数据序列满足特定条件的时间(小时数)。示例:根据高于给定限制的功耗计算正常运行时间。
现在我已经非常粗略地解决了这个问题,首先根据值是否符合标准生成一个 0 或 1 的系列,然后将该系列插入一个每分钟一个值的系列,然后对于每个 60 项块(即每小时) 计算平均值,最后使用扩展求和得到累加值。
生成测试样本的代码:
// Generate some example data:
DateTime startTime = DateTime.Now.AddYears(-2);
DateTime endTime = DateTime.Now;
var rnd = new Random();
var builder = new SeriesBuilder<DateTime,double>();
var loopTime = startTime;
while (loopTime < endTime) {
builder.Add(loopTime, rnd.NextDouble()*4000d);
loopTime = loopTime.AddSeconds(20d + rnd.NextDouble()*100d);
}
// Accumulation criteria:
double loRange = 2000;
double hiRange = 7000;
var dataSeries = builder.Series;
var dataSeries = builder.Series;
Console.WriteLine("dataSeries contains {0} elements:", dataSeries.KeyCount);
dataSeries.Print();
输出(类似):
dataSeries contains 900819 elements:
11.11.2013 11:01:24 -> 1621.68410123404
11.11.2013 11:02:55 -> 416.716641009188
11.11.2013 11:04:19 -> 1018.02155422886
11.11.2013 11:05:18 -> 1765.56553028783
11.11.2013 11:05:53 -> 3282.86457587167
11.11.2013 11:06:57 -> 1879.04809875369
11.11.2013 11:08:43 -> 3259.35750792704
11.11.2013 11:09:46 -> 3782.27438767547
11.11.2013 11:11:00 -> 1874.62322873744
11.11.2013 11:12:16 -> 3709.64714871237
11.11.2013 11:14:07 -> 3370.90207607062
11.11.2013 11:15:34 -> 696.881372806095
11.11.2013 11:17:28 -> 1225.02550539795
11.11.2013 11:18:51 -> 1002.21338356017
11.11.2013 11:19:13 -> 1485.16262578087
... -> ...
11.11.2015 10:45:20 -> 2453.45101247237
11.11.2015 10:45:53 -> 1941.55326762309
11.11.2015 10:47:23 -> 2050.54933673262
11.11.2015 10:48:52 -> 1520.56644368943
11.11.2015 10:50:38 -> 918.71558358833
11.11.2015 10:52:36 -> 1060.91481310358
11.11.2015 10:53:00 -> 2246.04634672685
11.11.2015 10:53:31 -> 532.949643457751
11.11.2015 10:55:21 -> 3282.98021447052
11.11.2015 10:56:10 -> 2528.27528795613
11.11.2015 10:57:32 -> 288.598969713132
11.11.2015 10:57:53 -> 3936.26360964787
11.11.2015 10:58:31 -> 2164.81776450054
11.11.2015 10:59:39 -> 2257.33688392552
11.11.2015 11:00:53 -> 2500.25997427304
然后是累加代码:
// Make new series that holds 1 if criteria is met and 0 if not:
var rawRunningStatus = dataSeries.Select(item => item.Value > loRange && item.Value <= hiRange ? 1d : 0d);
// Generate a set of timestamps for minute-interpolation:
var allMinuteTimestamps = Enumerable.Range(0, 1 + (int)endTime.Subtract(startTime).TotalMinutes)
.Select(offset => startTime.AddMinutes(offset))
.ToArray();
var interpolatedPerMinute = rawRunningStatus.InterpolateLinear(allMinuteTimestamps, (t1, t2) => (double)(t2 - t1).Ticks);
var runningStatusPerHour = interpolatedPerMinute.Chunk(60).Select(h => h.Value.Mean());
var hourlyTotal = Stats.expandingSum(runningStatusPerHour);
// Finally, downsample to daily values:
var total = hourlyTotal.Aggregate(
Aggregation.ChunkWhile<DateTime>((d1, d2) => d1.Date == d2.Date),
chunk => KeyValue.Create(chunk.Data.FirstKey().Date, OptionalValue.Create(chunk.Data.LastValue())));
total.Print();
输出类似:
11.11.2013 00:00:00 -> 4.99771656325056
12.11.2013 00:00:00 -> 16.9026901059461
13.11.2013 00:00:00 -> 28.4346829644089
14.11.2013 00:00:00 -> 40.1625579749059
15.11.2013 00:00:00 -> 51.5181959176441
16.11.2013 00:00:00 -> 63.3829428344991
17.11.2013 00:00:00 -> 75.9086422691164
18.11.2013 00:00:00 -> 87.8313912439796
19.11.2013 00:00:00 -> 99.5987951841397
20.11.2013 00:00:00 -> 111.948344483972
21.11.2013 00:00:00 -> 124.061910225063
22.11.2013 00:00:00 -> 136.246391792296
23.11.2013 00:00:00 -> 148.16014209087
24.11.2013 00:00:00 -> 160.765062418469
25.11.2013 00:00:00 -> 173.016057085084
... -> ...
28.10.2015 00:00:00 -> 8598.97006880338
29.10.2015 00:00:00 -> 8610.78890255565
30.10.2015 00:00:00 -> 8623.3606308629
31.10.2015 00:00:00 -> 8635.00161576886
01.11.2015 00:00:00 -> 8647.01037016803
02.11.2015 00:00:00 -> 8659.29864453147
03.11.2015 00:00:00 -> 8670.50290643172
04.11.2015 00:00:00 -> 8682.60815854525
05.11.2015 00:00:00 -> 8694.33936107482
06.11.2015 00:00:00 -> 8705.96593473042
07.11.2015 00:00:00 -> 8717.99715996875
08.11.2015 00:00:00 -> 8730.08276786404
09.11.2015 00:00:00 -> 8741.60045331112
10.11.2015 00:00:00 -> 8753.78545457176
11.11.2015 00:00:00 -> 8761.26739180859
我很确定这可以以更好的方式解决,无论是在准确性和执行速度方面,但我还没有弄清楚如何去做,也没有找到任何正在运行的聚合方法序列中的键而不是值部分。有什么建议么?