1

我有下transaction表:

在此处输入图像描述

我想计算每个购买的总数量:

  • product
  • category(即同一个内所有产品的总数量category
  • department(即同一个内所有产品的总数量department

此外,应计算上述总数:

  1. 每个购物者
  2. 每个家庭/家庭(同一家庭中所有购物者的总数量)。

输出表应如下所示:

在此处输入图像描述

对于家庭,总数计算一次,然后“复制”给同一家庭中的每个购物者。

为了计算表格中product//的多个总计categorydepartment我使用的是在前面的问题GROUPING SETS中向我指出的here 。所以我得到了正确的。total_quantity_individual

对于,在更简单的表上使用这里指出的方法是total_quantity_family有意义的。OVER(PARTITION BY)

但是,我不确定如何将两者结合在一起。没有太多关于与 结合的OVER(PARTITION BY)信息GROUPING SETS

我的查询看起来像:

SELECT
    family_id,
    shopper_id,
    CASE
        WHEN GROUPING__ID = 6 THEN 'department'
        WHEN GROUPING__ID = 5 THEN 'category'
        WHEN GROUPING__ID = 3 THEN 'product'
    END AS total_level_type,
    CASE
        WHEN GROUPING__ID = 6 THEN department
        WHEN GROUPING__ID = 5 THEN category
        WHEN GROUPING__ID = 3 THEN product
    END AS id,
    SUM(quantity) AS total_quantity_shopper
    -- sum(sum(quantity)) OVER (PARTITION BY family_id, product) AS total_quantity_family
FROM
    transaction
GROUP BY
    family_id, 
    shopper_id,
    product,
    category,
    department
    GROUPING SETS (
        (family_id, shopper_id, product),
        (family_id, shopper_id, category),
        (family_id, shopper_id, department)
    )
ORDER BY
  total_level_type;

如果OVER(PARTITION BY)不适用于我的情况,我的其他选择可能是:

  1. transaction按分组family_id,然后对结果运行 GROUPING SETS,然后用 重新加入transaction
  2. 也许使用explode() 和横向视图的技巧?

出于可维护性的原因,我真的不想将个人版本与家庭版本之间的查询分开。

注意:如果有帮助,我将使用带有 Hive 上下文的 Spark SQL。

感谢任何帮助。谢谢!

编辑:这似乎工作:

...
SUM(quantity) AS total_quantity_shopper,
CASE        
    WHEN GROUPING__ID = 6 THEN sum(sum(quantity)) OVER (PARTITION BY family_id, department)
    WHEN GROUPING__ID = 5 THEN sum(sum(quantity)) OVER (PARTITION BY family_id, category)
    WHEN GROUPING__ID = 3 THEN sum(sum(quantity)) OVER (PARTITION BY family_id, product)
END AS total_quantity_family
...
4

1 回答 1

1

使用多个sum() over()不同的partition by子句:

select
  family_id,
  shopper_id,
  total_level_type,
  id,
  total_quantity_individual,
  total_quantity_family
from
(
  select 
      family_id, 
      shopper_id,
      array(
        NAMED_STRUCT('id', product, 
                     'total_level_type', 'product',
                     'total_quantity_individual', sum(quantity) OVER (PARTITION BY family_id, shopper_id, product),
                     'total_quantity_family', sum(quantity) OVER (PARTITION BY family_id, product)
                     ),
        NAMED_STRUCT('id', category, 
                     'total_level_type', 'category',
                     'total_quantity_individual', sum(quantity) OVER (PARTITION BY family_id, shopper_id, category),
                     'total_quantity_family', sum(quantity) OVER (PARTITION BY family_id, category)
                     ),
        NAMED_STRUCT('id', department, 
                     'total_level_type', 'department', 
                     'total_quantity_individual', sum(quantity) OVER (PARTITION BY family_id, shopper_id, department),
                     'total_quantity_family', sum(quantity) OVER (PARTITION BY family_id, department)
                     )
      ) AS array_structs
  from
    transaction
)
lateral view inline(array_structs) exploded
order by
  total_level_type
于 2020-12-02T11:29:16.427 回答