json - How to filter by tags in array when using JSON in Snowflake

Question

I want to store millions of time-series, where each point in time of every time-series is labeled with arbitrary set of tags. It appears I should use JSON array with tags in Snowflake:

CREATE TABLE timeseries (obj_id INT, ts DATE, tags VARIANT, val INT)
INSERT INTO timeseries (obj_id, ts, tags, val) VALUES (442243, '2017-01-01', parse_json('["red", "small", "cheap"]'), 1)
INSERT INTO timeseries (obj_id, ts, tags, val) VALUES (673124, '2017-01-01', parse_json('["red", "small", "expensive"]'), 2)
INSERT INTO timeseries (obj_id, ts, tags, val) VALUES (773235, '2017-01-01', parse_json('["black", "small", "cheap"]'), 3)

Now I want to see an average of all time-series labeled with "small" AND "cheap", e.g.

SELECT ts, AVG(val)
FROM timeseries
WHERE "small" IN tags AND "cheap" IN tags
GROUP BY ts

which would return:

ts, avg(val)
2017-01-01, 2

What is the right Snowflake syntax/schema/approach to achieve it? Note, I do NOT want to FLATTEN exploding the rows, I just want to filter out all the rows that are not 'cheap' and 'small'.

score 1 · Accepted Answer

您可以直接使用数组类型，而不是使用 JSON，例如：

CREATE TABLE ts2 (obj_id INT, ts DATE, tags ARRAY, val INT);
INSERT INTO ts2 (obj_id, ts, tags, val) select 442243, '2017-01-01', ARRAY_CONSTRUCT('red', 'small', 'cheap'), 1;
INSERT INTO ts2 (obj_id, ts, tags, val) select 673124, '2017-02-01', ARRAY_CONSTRUCT('red', 'small', 'expensive'), 2;
INSERT INTO ts2 (obj_id, ts, tags, val) select 773235, '2017-01-01', ARRAY_CONSTRUCT('black', 'small', 'cheap'), 3;

VALUES 子句不能使用 ARRAY_CONSTRUCT 等函数，但 INSERT-SELECT 会起作用。（您也可以使用 JSON 和 VARIANT 类型执行此操作，但是您需要使用键名标记值，并在插入中使用 PARSE_JSON。）

然后查询只包含您选择的两个标签的行，使用如下查询：

select 
  obj_id,
  tags
from ts2
where ARRAY_CONTAINS('small'::variant, tags)
  and ARRAY_CONTAINS('cheap'::variant, tags)
;

json - How to filter by tags in array when using JSON in Snowflake

1 回答 1

Related

Reference