这是我的问题,我在 hive 中有一个表,有 2 列、id 和一个 double 值数组。我想为给定用户跨行添加双精度值。这就是数据的样子。每个数组的长度都大于 100,但为简单起见,我在这里只给出了 3。
+--------+--------------------------
| id | value |
+--------+--------------------------+
| 1 | [0.03,0.15,-0.03] |
| 1 | [-0.2,0.11,-0.16] |
| 1 | [0.03,0.15,-0.03] |
| 2 | [0.02,0.01,0.05] |
| 2 | [0.1,0.03,0.3] |
+--------+--------------------------+
我期待的答案是
+--------+--------------------
| id | value |
+--------+---------------------
| 1 | [0.4,0.41,-0.22] |
| 2 | [0.12,0.04,0.35] |
+--------+---------------------
如何使用配置单元查询来做到这一点?提前致谢
更新:这是我用来获得解决方案的方法。但我正在寻找更好的解决方案。
SELECT id, concat_ws(',', collect_list(CAST(val_new AS STRING))) as val_fin FROM (SELECT id, avg(valueid) as val_new from (SELECT id, valueid, index from user_interest_profiles.clicked_articles LATERAL VIEW POSEXPLODE(split(vector,'\\,')) value AS index, valueid )x GROUP BY id, index)x GROUP BY id;
我使用的实现是:
- 用索引分解数组
- 按 id 和索引对分组的值进行平均
- 使用 collect_list 跨行连接值