create external table if not exists my_table
(customer_id STRING,ip_id STRING)
location 'ip_b_class';
接着:
hive> set mapred.reduce.tasks=50;
hive> select count(distinct customer_id) from my_table;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
里面有 160GB,加上 1 个减速器需要很长时间......
[ihadanny@lvshdc2en0011 ~]$ hdu
Found 8 items
162808042208 hdfs://horton/ip_b_class
...