apache-spark - SPARK - 如何通过查询使用分组功能

Question

我要将SHARK查询迁移到SPARK中。

下面是我在 group by 子句中使用函数的示例SHARK查询。

select month(dt_cr) as Month,
   day(dt_cr)   as date_of_created,
   count(distinct phone_number) as total_customers        
from customer
group by month(dt_cr),day(dt_cr);

同样的查询在SPARK sql 中不起作用，它给出了以下错误；

错误： org.apache.spark.sql.catalyst.errors.package$TreeNodeException：表达式不在 GROUP BY 中。

因此，作为解决方案的一部分，我在 SPARK 查询下使用，这是有效的，但需要更改代码。这对我现有的项目影响很大。因此，任何人都有一个影响最小的更好的解决方案。

SELECT Month,date_of_created,count(distinct phone_number) as total_customers        
FROM
(select month(dt_cr) as Month,
    day(dt_cr)   as date_of_created,
    email
from customers)A
group by Month,date_of_created

score 0 · Accepted Answer

这是 Spark SQL 中的一个问题：https ://issues.apache.org/jira/browse/SPARK-4296

但是，我认为它将在下一个版本中修复。现在，您必须更改代码以绕过此问题。

apache-spark - SPARK - 如何通过查询使用分组功能

1 回答 1

Related

Reference