2

我在 Hbase 上用 phoenix 制作了两张表。

一个是 ORIGIN_LOG,另一个是 ORIGIN_LOG_INDEX。

在 ORIGIN_LOG 中,key 是 info_key。在 ORIGIN_LOG_INDEX 中,key 是 (log_t, zone)

并且我们将log_t、zone、info_key保存在ORIGIN_LOG_INDEX中,这样我们就可以通过ORIGIN_LOG_INDEX中的log_t和zone快速查找info_key。然后使用 info_key,我们可以通过 info_key 从 ORIGIN_LOG 中获取详细的日志信息,因为 info_key 是 ORIGIN_LOG 的键。

但是当我们解释下面的sql时。我们发现它需要对 ORIGIN_LOG 进行全面扫描。

explain select "log_t", "app_ver", "device_id", "mobage_uid",     "param1","param2","param3", "param4" , "param5", "user_id", "a_typ", "a_tar", "a_rst"  from "ORIGIN_LOG" where "info_key" in (select distinct "info_key" from "ORIGIN_LOG_INDEX" where  "log_t">='1423956600' and  "log_t"<'1423956601' and  "zone" ='18')



    CLIENT 4-CHUNK PARALLEL 4-WAY FULL SCAN OVER ORIGIN_LOG 
    CLIENT MERGE SORT                        |
    |     SKIP-SCAN-JOIN TABLE 0               |
    |         CLIENT 2-CHUNK PARALLEL 2-WAY SKIP SCAN ON 2 RANGES OVER         
    ORIGIN_LOG_INDEX [0,'1423956600','18'] - [1,'1423956601','18'] |
    |             SERVER FILTER BY FIRST KEY ONLY |
    |             SERVER AGGREGATE INTO DISTINCT ROWS BY [info_key] |
    |         CLIENT MERGE SORT                |
    |     DYNAMIC SERVER FILTER BY info_key IN ($5.$7) |

如果我们只使用带有条件 log_t 和 zone 的 ORIGIN_LOG,如下所示:

select "log_t", "app_ver", "device_id", "mobage_uid", "param1","param2","param3", "param4" , "param5", "user_id", "a_typ", "a_tar", "a_rst"  from "ORIGIN_LOG"  where  "log_t">='1423956600' and  "log_t"<'1423956601' and  "zone" ='18';

我们也得到全面扫描。

CLIENT 4-CHUNK PARALLEL 4-WAY FULL SCAN OVER ORIGIN_LOG |
|     SERVER FILTER BY (log_t >= '1423956600' AND log_t < '1423956601' AND  zone = '18') |
| CLIENT MERGE SORT                        |

那么两个sql有什么区别。以及哪个sql对性能更好。

谢谢你。

BR

4

1 回答 1

2

Your first query is range base scan of HBase on ORIGIN_LOG_INDEX and then Gets on ORIGIN_LOG. Your second query is a range based scan in HBase where you would provide a "startkey" and "endkey" for scan. Second query is much better because you are avoiding lookup into another table and you are also not doing distinct operation.
However, it is possible that startKey and endkey range might span entire table. So, the worst case of your scan is "FULL TABLE" scan. Hence, i think, the explain plan is showing it as a full table scan.Maybe, you can ask on the mailing list for further clarification.

于 2015-02-16T02:19:51.163 回答