0

The system collects many time series data which is unaligned:

Example:

+----------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----+
|   Time   | 12:00 | 12:01 | 12:02 | 12:03 | 12:04 | 12:05 | 12:06 | 12:07 | 12:08 | ... |
+----------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----+
| Series 1 |     8 |       |     2 |       |     4 |       |     8 |       |     6 |     |
| Series 2 |       |     5 |       |     4 |       |     7 |       |     2 |       |     |
| Series 3 |     5 |       |       |       |     7 |       |       |       |     2 |     |
| ...      |       |       |       |       |       |       |       |       |       |     |
+----------+-------+-------+-------+-------+-------+-------+-------+-------+-------+-----+

and may be also delayed or send in batch (i.e. the event time may not be equal to receive time, depends on the source)


The raw data will be up-sampling to 1 minute interval and fill missing value by linear interpolation with previous value. Then do the element-wised transformation like this:

Series 2 = Series 2 + Series 3
Series 1 = Series 1 * Series 2

so Series 1 depends on itself and Series 2. Series 2 depends on itself and Series 3.

The relationship of transformation between series can form a Directed acyclic graph (DAG). The relationship will change in runtime if user request for the change.


Currently all calculations are done instantly with Python Pandas when user retrieve the time series data. But the performance is getting worse when the volume of data increase or user selecting a wide time range.

Is there any way/tool to achieve this, such as stream or batch processing?

4

1 回答 1

0

尝试使用RedBlackPy!它是为方便使用时间序列而构建的,包括无约束插值。此外,您可以阅读有关TowardsDataScience的文章。RedBlackPy.Series类包含您正在寻找的所有功能。它支持沿着 Series 对象键的排序联合的算术方法(具有自动插值)。 系列迭代器

RedBlackPy 在处理动态有序数据(如时间序列)方面优于 Pandas。

在此处输入图像描述

于 2018-08-12T09:45:35.373 回答