I am new in Big Data and HBase, in participle. Now I am trying to use OpenTSDB to store data from sensors.
Configuration is: Cloudera vmware image with the last stable OpenTSDB installed on it. After configuring, I started server with
./build/tsdb tsd --port=4242 --staticroot=build/staticroot/ --cachedir=/tmp/tsd/ --auto-metric
Then, I ran simple netcat client:
#!/bin/bash
set -e
while true; do
./run $1 $2
sleep 1
done | nc -w 30 localhost 4242
With ./run compiled from:
#include <cstdio>
#include <cstdlib>
#include <time.h> /* time */
int main(int argc, char **argv)
{
if ( argc <= 2 ) {
fprintf(stderr, "2 param excepted: start point and number of sensors\n");
return 1;
}
unsigned long t = time(NULL);
srand(t);
int b; // index of first sensor
int n; // number of sensors
sscanf(argv[1], "%d", &b);
sscanf(argv[2], "%d", &n);
for ( int i = b; i < b+n; ++i ) {
printf("put democ.%d %d %lf host=localhost.localdomain\n", i, t, 1.0 + 0.01 * (rand() % 100));
}
return 0;
}
And afterwards watching for democ.%d metricas via localhost:4242.
I am satisfied with its performance, but there are problems when the generator produces a large number of metrics (n).
First problem is dissapearing of some datapoints. It depends of n. If n = 10000, there are 29 points in 30 seconds on the average. But if n = 75000, there are only 15 points. This problem is not critical. I think, it causes by disk bandwidth.
After some time, the server sends an error:
put: HBase error: 1000 RPCs waiting on "tsdb,\x00\x98[Q\x96E\xF0\x00\x00\x01\x00\x00\x01,1368809980414.dc6179de43f78eac6c8b745996200664." to come back online
Second problem is HBase failure, after the server has been running for some time. OpenTSDB dies with massive flooding to all clients and own console with such message:
put: HBase error: 10000 RPCs waiting on "-ROOT-,,0" to come back online
What can I do to solve this problem?
I also thought about the possibility of using Cassandra for my project.
What the best opensource solution to store time series data (approximately, I need to store data from 100 000 sensors for 30 days, while each sensor generates up to 40 bytes of data every second).