mapreduce - 在 Riak 中存储大量读取时间序列的最有效方法是什么

Question

我目前的做法：

我有一个域类 -应用程序
我系统中的每个应用程序都存储在APPLICATION_KEY键下的“应用程序”存储桶中
除了存储在此存储桶中的应用程序元数据外，每个应用程序都有自己的存储桶，称为“time_metrics/APPLICATION_KEY”，我以某种方式存储时间序列：

KEY - 时间戳/ VALUE - 一些属性

我担心的是在给定应用程序的特定时间窗口内进行查询的效率。目前要从某个特定时间窗口获取时间序列并最终进行一些缩减，我必须在整个“time_metric/APPLICATION_KEY”存储桶上进行 map/reduce，我发现这不是Riak Map/Reduce的推荐用例。

我的问题：对于这种系统，最好的数据库结构是什么，以及查询它的效率如何。

score 4 · Accepted Answer

添加到@macintux 的答案。

Basho 有一些客户使用 riak 作为时间序列指标。Boundary 就他们如何将 Riak 与他们的网络监控软件一起使用进行了一场精彩的技术演讲。他们将数据汇总到不同的时间块（1m、5m、15m）中进行分析。他们还有一系列博客文章，介绍了在实施该系统时所学到的经验教训。

Kivra 还有一个很好的幻灯片，介绍了他们如何将时间序列数据与 riak 一起使用。

您可以将数据汇总到某种任意时间长度，然后通过发出常规 K/V 获取来读取您需要的范围，然后在您的应用程序中重建更大的图片/减少。

score 3 · Accepted Answer

如果您有多余的计算能力并且事先知道您需要什么密钥，您当然可以使用 Riak 的 MapReduce，但通常检索密钥并在客户端上运行您的处理将同样快（并且不会使您的集群紧张）。

一些一般的想法：

将数据汇总成更大的块
- 如果您担心客户端在缓冲数据时崩溃而丢失数据，您可以随时在数据到达时存储数据
- Similar idea: store the data as it arrives, then retrieve it and roll it up at certain intervals
  - You can automatically expire data once you're confident it is being reliably stored in larger blocks, using either the Bitcask or Memory backends
  - Memory backend is quite useful (RAM permitting) for any data that only needs to be stored for a limited period of time
Related: don't be afraid to store multiple copies of your data to make reading/reporting easier later
- Multiple chunks of time (5- and 15-minute blocks, for example)
- Multiple report formats

Having said all that, if you're doing straight key/value requests (it's ideal to always be able to compute the keys you need, rather than doing indexing or searching), Riak can support very heavy traffic loads, so I wouldn't recommend spending too much time creating alternative storage mechanisms unless you know you're going to face latency problems.

mapreduce - 在 Riak 中存储大量读取时间序列的最有效方法是什么

2 回答 2

Related

Reference