0

I have a Saas application that monitor equipments.

I developed a monitoring platform sensors. My application currently oversees approximately 100,000 sensors. A value is recorded every 15 minutes.

Each measurement is currently stored in a single MySQL table (timestamp => value). To limit the number of rows in the MySQL table, all points of a single sensor are combined in a daily log.

The number of sensors increases exponentially. My BDD is already 100GB.

From these measures, I must make consolidations daily, monthly, annual and total.

My application needs to access recent data quickly, but I have to keep history for at least 10 years and can consult them.

NoSQL architectures seem to be the solution to store a lot of data and do the consolidation.

Which solution is most suitable for storing this type of data.

I tested CouchDB. I hesitate between different NoSQL solution (Hadoop,cassandra, mongodb...)

I'm looking for feedback on this experence.

4

1 回答 1

0

我将根据我对您提到的一些技术的经验提供建议。

使用 HDFS/Flume/Hadoop

您可能会考虑只编写纯文本文件,然后使用 Flume ( http://flume.apache.org/ ) 将它们移动到 HDFS ( http://en.wikipedia.org/wiki/Apache_Hadoop#Hadoop_Distributed_File_System )。

之后,您可以使用 Hadoop 及其所有工具针对存储在 HDFS 中的平面文件编写 map/reduce 作业。HDFS 将允许您很好地扩展存储大小。

使用 Mongo

您可以在 Mongo 中设置一个副本集并水平扩展以存储日志数据,但是 100GB 并且不断增长对于副本集来说可能有点太大了。Mongo 中的副本集(概念上与“集群”相同)不会无限扩展。

如果你发现你正在重载一个副本集,你可以分片日志信息(也许通过传感器和 id 条目?)然后你可以通过添加节点来无限地扩展。

-- 我会找到一些你喜欢在其中编写查询的东西。许多解决方案都会横向扩展,但并非所有解决方案都具有相同的生态系统。

于 2013-04-08T18:24:15.733 回答