1

We're looking to migrate some data into InfluxDB. I'm working with InfluxDB 2.0 on a test server to determine the best way to stock our data.

As of today, I have about 2.7 billion series to migrate to InfluxDB but that number will only go up.

Here is the structure of the data I need to stock:

  • ClientId (332 values as of today, string of 7 characters)
  • Driver (int, 45k values as of today, will increase)
  • Vehicle (int, 28k values as of today, will increase)
  • Channel (100 values, should not increase, string of 40 characters)
  • value of the channel (float, 1 value per channel/vehicle/driver/client at a given timestamp)

At first, I thought of stocking my data this way:

  • One bucket (as all data have the same data retention)
  • Measurements = channels (so 100 kind of measurements are stocked)
  • Tag Keys = ClientId
  • Fields = Driver, Vehicle, Value of channel

This gave me a cardinality of 1 * 100 * 332 * 3 = 99 600 according to this article

But then I realized that InfluxDB handle duplicate based on "measurement name, tag set, and timestamp".

So for my data, this will not work, as I need the duplicate to be based on ClientId, Channel, Vehicle at the minimum.

But if I change my data structure to be stored this way:

  • One bucket (as all data have the same data retention)
  • Measurements = channels (so 100 kind of measurements are stocked)
  • Tag Keys = ClientId, Vehicle
  • Fields = Driver, Value of channel

then I'll get a cardinality of 2 788 800 000.

I understand that I need to keep cardinality as low as possible. (And ideally I would even need to be able to search by driver as well as by vehicle.)

My questions are:

  • If I split the data into different buckets (ex: 1 bucket per clientId), will it decrease my cardinality?
  • What would be the best way to stock data for such a large amount of series?
4

0 回答 0