hadoop - Hive 总是给出“在编译时确定的减少任务数：1”，无论我做什么

Question

create external table if not exists my_table
(customer_id STRING,ip_id STRING)
location 'ip_b_class';

接着：

hive> set mapred.reduce.tasks=50;
hive> select count(distinct customer_id) from my_table;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1

里面有 160GB，加上 1 个减速器需要很长时间......

[ihadanny@lvshdc2en0011 ~]$ hdu 
Found 8 items
162808042208   hdfs://horton/ip_b_class

...

score 2 · Accepted Answer

从逻辑上讲，这里不能有多个减速器。除非来自各个地图任务的所有不同客户 ID 都集中到一个地方，否则无法确定区别并且无法生成单个计数。换句话说，除非您将所有客户 ID 堆放在一个位置，否则您不能说每个 ID 都是不同的并最终计算它们。

score 1 · Accepted Answer

@Rags 提供的原始答案和解释是正确的。附加的链接通过重写查询为您提供了很好的解决方法。我建议如果您不想重写查询，请使用此选项为减速器提供更多内存：

set mapreduce.reduce.java.opts=-Xmx8000m

该选项将 reducer 使用的最大内存设置为 8 GB。如果你有更多，那么你可以在这里指定更高的值。希望这可以帮助

hadoop - Hive 总是给出“在编译时确定的减少任务数：1”，无论我做什么

2 回答 2

Related

Reference