hadoop - Hive 中的不同列

Question

我试图在 HiveQL 中获得一个查询结果，其中一列是不同的。但是结果不匹配。表中有近 20 列。

create table uniq_us row format delimited fields terminated by ',' lines terminated by '\n' as select distinct(a),b,c,d,e,f,g,h,i,j from ctry_us_join;

结果行数：513238

select count(distinct a) from ctry_us_join;

结果行数：151616

这怎么可能，并且在我的第一个或第二个查询中有问题

score 0 · Accepted Answer

Distinct是关键字，而不是函数。它适用于您在select子句中列出的所有列。您的表在 column 中只有 151,616 个不同的值是很合理的a，但是在该列中具有相同值的多行在a其他列中具有不同的值。这可能会给您 513,238 个不同的行。

score 0 · Accepted Answer

您需要将 subselect 与 group by 语句一起使用。

select count(a) from (
select a, count(*) from ctry_us_join group by a) b

这只是一个解决方案。

hadoop - Hive 中的不同列

2 回答 2

Related

Reference