I'm investigating the feasibility of using BigQuery to store sensor data in time series. The intent is to store the data in BQ and process it in Pandas... so far so good... Pandas can interpret a TIMESTAMP field index and create a Series.
An additional requirement is that the data support arbitrary tags as key/value pairs (e.g. job_id=1234, task_id=5678). BigQuery can support this nicely with REPEATED fields of type RECORD:
{'fields':
[
{
"mode": "NULLABLE",
"name": "timestamp",
"type": "TIMESTAMP"
},
{
"mode": "REPEATED",
"name": "tag",
"type": "RECORD",
"fields":
[
{
"name":"name",
"type":"STRING"
},
{
"name":"value",
"type":"STRING"
},
{
"mode": "NULLABLE",
"name": "measurement_1",
"type": "FLOAT"
},
{
"mode": "NULLABLE",
"name": "measurement_2",
"type": "FLOAT"
},
{
"mode": "NULLABLE",
"name": "measurement_3",
"type": "FLOAT"
},
]
},
]
}
This works great for storing the data and it even works great for querying if I only need to filter on a single key/value combination
SELECT measurement_1 FROM measurements
WHERE tag.name = 'job_id' AND tag.value = '1234'
However, I also need to be able to combine sets of tags in query expressions and I can't seem to make this work. For example this query returns no result
SELECT measurement_1 FROM measurements
WHERE tag.name = 'job_id' AND tag.value = '1234'
AND tag.name = 'task_id' AND tag.value = '5678'
Questions: Is it possible to formulate a query to do what I want using this schema? What is the recommended way to attach this type of variable data to an otherwise fixed schema in Big Query?
Thanks for any help or suggestions!
Note: If you're thinking this looks like a great fix for InfluxDB it's because that's what I've been using thus far. The seemingly insurmountable issue is the amount of series cardinality in my data set, so I'm looking for alternatives.