我在蜂巢中有一张看起来像这样的桌子。我想要做的是运行一个查询,每 3 小时,我查看唯一的 workerUUID 并对它们进行一些操作。所以我想做的是现在到3小时前
- 捕获所有唯一的 workerUUID
Select * from these workerUUIDs
我正在使用 hive 运行此查询,并且该表每三到六个小时就有几百万个条目。编写此查询的最佳方法是什么?
--------------------------------------------
| workerUUID | City | Debt | TestN| LName|
|------------------------------------------|
| 1234 | SF | 100k | 23 | Nil |
|-------------------------------------------
| 6789 | NY | 150k | 34 | Fa |
|------------------------------------------|
| 1234 | SF | 10k | 45 | Na |
--------------------------------------------
| 6789 | NY | 1k | 13 | Nil |
|-------------------------------------------
| 6789 | SF | 150k | 34 | Nil |
|------------------------------------------|
| 8999 | IN | 10k | 45 | Na |
--------------------------------------------
基本上我想做类似的事情
select City, Debt, TestN where workerUUID = '1234'
select City, Debt, TestN where workerUUID = '6789'
select City, Debt, TestN where workerUUID = '8999'
为了进一步澄清,我想生成临时表,如
| workerUUID | City | Debt | TestN|
|------------------------------------
| 1234 | SF | 100k | 23 |
|------------------------------------
| 1234 | SF | 10k | 45 |
|-----------------------------------|
| workerUUID | City | Debt | TestN|
|------------------------------------
| 6789 | NY | 150k | 23 |
|------------------------------------
| 6789 | NY | 1k | 13 |
|------------------------------------
| 6789 | NY | 150k | 34 |
|-----------------------------------
| workerUUID | City | Debt | TestN|
|------------------------------------
| 8999 | IN | 10k | 45 |
ETC
对于 3 小时间隔内生成的 workerUUID 的所有唯一值