bigdata - InfluxDB optimize storage for 2.7 billion series and more

Question

We're looking to migrate some data into InfluxDB. I'm working with InfluxDB 2.0 on a test server to determine the best way to stock our data.

As of today, I have about 2.7 billion series to migrate to InfluxDB but that number will only go up.

Here is the structure of the data I need to stock:

ClientId (332 values as of today, string of 7 characters)
Driver (int, 45k values as of today, will increase)
Vehicle (int, 28k values as of today, will increase)
Channel (100 values, should not increase, string of 40 characters)
value of the channel (float, 1 value per channel/vehicle/driver/client at a given timestamp)

At first, I thought of stocking my data this way:

One bucket (as all data have the same data retention)
Measurements = channels (so 100 kind of measurements are stocked)
Tag Keys = ClientId
Fields = Driver, Vehicle, Value of channel

This gave me a cardinality of 1 * 100 * 332 * 3 = 99 600 according to this article

But then I realized that InfluxDB handle duplicate based on "measurement name, tag set, and timestamp".

So for my data, this will not work, as I need the duplicate to be based on ClientId, Channel, Vehicle at the minimum.

But if I change my data structure to be stored this way:

One bucket (as all data have the same data retention)
Measurements = channels (so 100 kind of measurements are stocked)
Tag Keys = ClientId, Vehicle
Fields = Driver, Value of channel

then I'll get a cardinality of 2 788 800 000.

I understand that I need to keep cardinality as low as possible. (And ideally I would even need to be able to search by driver as well as by vehicle.)

My questions are:

If I split the data into different buckets (ex: 1 bucket per clientId), will it decrease my cardinality?
What would be the best way to stock data for such a large amount of series?

bigdata - InfluxDB optimize storage for 2.7 billion series and more

0 回答 0

Related

Reference