0

我在蜂巢中有一张看起来像这样的桌子。我想要做的是运行一个查询,每 3 小时,我查看唯一的 workerUUID 并对它们进行一些操作。所以我想做的是现在到3小时前

  1. 捕获所有唯一的 workerUUID
  2. Select * from these workerUUIDs

我正在使用 hive 运行此查询,并且该表每三到六个小时就有几百万个条目。编写此查询的最佳方法是什么?

--------------------------------------------
| workerUUID | City |  Debt  | TestN| LName| 
|------------------------------------------|
| 1234       |  SF  |  100k  | 23   |  Nil |
|-------------------------------------------
| 6789       |  NY  |  150k  | 34   |  Fa  |
|------------------------------------------|
| 1234       |  SF  |  10k   | 45   |  Na  |
--------------------------------------------
| 6789       |  NY  |  1k    | 13   |  Nil |
|-------------------------------------------
| 6789       |  SF  |  150k  | 34   |  Nil |
|------------------------------------------|
| 8999       |  IN  |  10k   | 45   |  Na  |
--------------------------------------------

基本上我想做类似的事情

 select City, Debt, TestN where workerUUID = '1234'
 select City, Debt, TestN where workerUUID = '6789'
 select City, Debt, TestN where workerUUID = '8999'

为了进一步澄清,我想生成临时表,如


| workerUUID | City |  Debt  | TestN| 
|------------------------------------
| 1234       |  SF  |  100k  | 23   |
|------------------------------------
| 1234       |  SF  |  10k   | 45   |
|-----------------------------------|


| workerUUID | City |  Debt  | TestN| 
|------------------------------------
| 6789       |  NY  |  150k  | 23   |
|------------------------------------
| 6789       |  NY  |  1k    | 13   |
|------------------------------------
| 6789       |  NY  |  150k  | 34   |
|-----------------------------------


| workerUUID | City |  Debt  | TestN| 
|------------------------------------
| 8999       |  IN  |  10k   | 45   |

ETC

对于 3 小时间隔内生成的 workerUUID 的所有唯一值

4

0 回答 0