0

I am running a query in Hive which is pretty straight forward but I am continuously exceeded GC timeout and OOM errors,

Query is of the form

select a.field1 -- selecting about 30 cols! from table1 t1 join table2 t2 on t1.field2 = t2.field2 and t1.date = '20120801' join table2 t3 on t1.field7 = t2.field2 and t1.date = '20120801'

I am selecting about 30 fields from this query. table1 is partitioned by date and contains around 300,000 records. table2 contains about 100 records.

Is there some way I can optimise this query?

4

1 回答 1

0

一直在玩 Mapjoin 几个小时,终于让它工作了

添加了提示 SELECT / + MAPJOIN(t2,t3) /

查询现在在几秒钟内运行

于 2012-08-31T02:14:41.250 回答