0

这是我的问题,我在 hive 中有一个表,有 2 列、id 和一个 double 值数组。我想为给定用户跨行添加双精度值。这就是数据的样子。每个数组的长度都大于 100,但为简单起见,我在这里只给出了 3。

+--------+--------------------------
| id  |         value         | 
+--------+--------------------------+
| 1      | [0.03,0.15,-0.03]  |
| 1      | [-0.2,0.11,-0.16]  |
| 1      | [0.03,0.15,-0.03]  | 
| 2      | [0.02,0.01,0.05]   | 
| 2      | [0.1,0.03,0.3]     | 
+--------+--------------------------+

我期待的答案是

+--------+--------------------
| id  |         value         | 
+--------+---------------------
| 1      | [0.4,0.41,-0.22]   |
| 2      | [0.12,0.04,0.35]   | 
+--------+---------------------

如何使用配置单元查询来做到这一点?提前致谢

更新:这是我用来获得解决方案的方法。但我正在寻找更好的解决方案。

SELECT id, concat_ws(',', collect_list(CAST(val_new AS STRING))) as val_fin FROM (SELECT id, avg(valueid) as val_new from (SELECT id, valueid, index from user_interest_profiles.clicked_articles LATERAL VIEW POSEXPLODE(split(vector,'\\,')) value AS index, valueid )x GROUP BY id, index)x GROUP BY id;

我使用的实现是:

  • 用索引分解数组
  • 按 id 和索引对分组的值进行平均
  • 使用 collect_list 跨行连接值
4

0 回答 0