1

我有一个问题,我目前在 Deedle 中解决的有点快和肮脏,但我显然没有以最有效的方式解决它。

输入数据由一些转储组成,这些转储在 format 上具有非常频繁的数据<DateTime,double>,采样间隔有些随机(在约 20 毫秒到 10 分钟之间变化)。

我想计算一个新的时间序列,其中一个或多个这些数据序列满足特定条件的时间(小时数)。示例:根据高于给定限制的功耗计算正常运行时间。

现在我已经非常粗略地解决了这个问题,首先根据值是否符合标准生成一个 0 或 1 的系列,然后将该系列插入一个每分钟一个值的系列,然后对于每个 60 项块(即每小时) 计算平均值,最后使用扩展求和得到累加值。

生成测试样本的代码:

// Generate some example data:
DateTime startTime = DateTime.Now.AddYears(-2);
DateTime endTime = DateTime.Now;

var rnd = new Random();
var builder = new SeriesBuilder<DateTime,double>();
var loopTime = startTime;
while (loopTime < endTime) {
    builder.Add(loopTime, rnd.NextDouble()*4000d);
    loopTime = loopTime.AddSeconds(20d + rnd.NextDouble()*100d);
}

// Accumulation criteria:
double loRange = 2000;
double hiRange = 7000;

var dataSeries = builder.Series;

var dataSeries = builder.Series;

Console.WriteLine("dataSeries contains {0} elements:", dataSeries.KeyCount);
dataSeries.Print();

输出(类似):

dataSeries contains 900819 elements:
11.11.2013 11:01:24 -> 1621.68410123404 
11.11.2013 11:02:55 -> 416.716641009188 
11.11.2013 11:04:19 -> 1018.02155422886 
11.11.2013 11:05:18 -> 1765.56553028783 
11.11.2013 11:05:53 -> 3282.86457587167 
11.11.2013 11:06:57 -> 1879.04809875369 
11.11.2013 11:08:43 -> 3259.35750792704 
11.11.2013 11:09:46 -> 3782.27438767547 
11.11.2013 11:11:00 -> 1874.62322873744 
11.11.2013 11:12:16 -> 3709.64714871237 
11.11.2013 11:14:07 -> 3370.90207607062 
11.11.2013 11:15:34 -> 696.881372806095 
11.11.2013 11:17:28 -> 1225.02550539795 
11.11.2013 11:18:51 -> 1002.21338356017 
11.11.2013 11:19:13 -> 1485.16262578087 
...                 -> ...              
11.11.2015 10:45:20 -> 2453.45101247237 
11.11.2015 10:45:53 -> 1941.55326762309 
11.11.2015 10:47:23 -> 2050.54933673262 
11.11.2015 10:48:52 -> 1520.56644368943 
11.11.2015 10:50:38 -> 918.71558358833  
11.11.2015 10:52:36 -> 1060.91481310358 
11.11.2015 10:53:00 -> 2246.04634672685 
11.11.2015 10:53:31 -> 532.949643457751 
11.11.2015 10:55:21 -> 3282.98021447052 
11.11.2015 10:56:10 -> 2528.27528795613 
11.11.2015 10:57:32 -> 288.598969713132 
11.11.2015 10:57:53 -> 3936.26360964787 
11.11.2015 10:58:31 -> 2164.81776450054 
11.11.2015 10:59:39 -> 2257.33688392552 
11.11.2015 11:00:53 -> 2500.25997427304 

然后是累加代码:

// Make new series that holds 1 if criteria is met and 0 if not:
var rawRunningStatus = dataSeries.Select(item => item.Value > loRange && item.Value <= hiRange ? 1d : 0d);

// Generate a set of timestamps for minute-interpolation:
var allMinuteTimestamps = Enumerable.Range(0, 1 + (int)endTime.Subtract(startTime).TotalMinutes)
    .Select(offset => startTime.AddMinutes(offset))
    .ToArray();

var interpolatedPerMinute = rawRunningStatus.InterpolateLinear(allMinuteTimestamps, (t1, t2) => (double)(t2 - t1).Ticks);

var runningStatusPerHour = interpolatedPerMinute.Chunk(60).Select(h => h.Value.Mean());

var hourlyTotal = Stats.expandingSum(runningStatusPerHour);

// Finally, downsample to daily values:
var total = hourlyTotal.Aggregate(
                Aggregation.ChunkWhile<DateTime>((d1, d2) => d1.Date == d2.Date),
                chunk => KeyValue.Create(chunk.Data.FirstKey().Date, OptionalValue.Create(chunk.Data.LastValue())));
total.Print();

输出类似:

11.11.2013 00:00:00 -> 4.99771656325056 
12.11.2013 00:00:00 -> 16.9026901059461 
13.11.2013 00:00:00 -> 28.4346829644089 
14.11.2013 00:00:00 -> 40.1625579749059 
15.11.2013 00:00:00 -> 51.5181959176441 
16.11.2013 00:00:00 -> 63.3829428344991 
17.11.2013 00:00:00 -> 75.9086422691164 
18.11.2013 00:00:00 -> 87.8313912439796 
19.11.2013 00:00:00 -> 99.5987951841397 
20.11.2013 00:00:00 -> 111.948344483972 
21.11.2013 00:00:00 -> 124.061910225063 
22.11.2013 00:00:00 -> 136.246391792296 
23.11.2013 00:00:00 -> 148.16014209087  
24.11.2013 00:00:00 -> 160.765062418469 
25.11.2013 00:00:00 -> 173.016057085084 
...                 -> ...              
28.10.2015 00:00:00 -> 8598.97006880338 
29.10.2015 00:00:00 -> 8610.78890255565 
30.10.2015 00:00:00 -> 8623.3606308629  
31.10.2015 00:00:00 -> 8635.00161576886 
01.11.2015 00:00:00 -> 8647.01037016803 
02.11.2015 00:00:00 -> 8659.29864453147 
03.11.2015 00:00:00 -> 8670.50290643172 
04.11.2015 00:00:00 -> 8682.60815854525 
05.11.2015 00:00:00 -> 8694.33936107482 
06.11.2015 00:00:00 -> 8705.96593473042 
07.11.2015 00:00:00 -> 8717.99715996875 
08.11.2015 00:00:00 -> 8730.08276786404 
09.11.2015 00:00:00 -> 8741.60045331112 
10.11.2015 00:00:00 -> 8753.78545457176 
11.11.2015 00:00:00 -> 8761.26739180859 

我很确定这可以以更好的方式解决,无论是在准确性和执行速度方面,但我还没有弄清楚如何去做,也没有找到任何正在运行的聚合方法序列中的键而不是值部分。有什么建议么?

4

0 回答 0