0

我正在尝试进行简单的蜂巢转换。

简单的 Hive 转换

有人可以为我提供一种方法吗?我已经尝试过 collect_set 并且目前正在查看 klout 的开源 UDF。

4

4 回答 4

1

我认为这会给你你想要的。我无法运行它并调试它。祝你好运!

select start_point.unit
  , start_time as start
  , start_time + min(stop_time - start_time) as stop
from
  (select * from 
      (select date_time as start_time
        , unit
        , last_value(unit) over (order by date_time row desc between current row and 1 following) as previous_unit
      from table
      ) previous
      where unit <> previous_unit
  ) start_points
left outer join
  (select * from 
      (select date_time as stop_time
        , unit
        , last_value(unit) over (order by date_time row between current row and 1 following) as next_unit
      from table
      ) next
      where unit <> next_unit
  ) stop_points
on start_points.unit = stop_points.unit
where stop_time > start_time
group by start_point.unit, start_time
;
于 2015-06-26T13:32:41.240 回答
0

使用 min 和 max 函数怎么样?我认为以下内容将为您提供所需的东西:

SELECT
    Unit,
    MIN(datetime) as start,
    MAX(datetime) as stop
from table_name
group by Unit
;
于 2015-06-21T17:09:59.457 回答
0

我找到了。感谢使用窗口函数的指针

select *
from
(select *, 
case when lag(unit,1) over (partition by id order by effective_time_ut desc) is NULL THEN 1 
when unit<>lag(unit,1) over (partition by id order by effective_time_ut desc) then 1 
when lead(unit,1) over (partition by id order by effective_time_ut desc) is NULL then 1
else 0 end as different_loc
from units_we_care) a
where different_loc=1
于 2015-07-02T10:23:33.287 回答
0
create table temptable as select unit, start_date, end_time, row_number () over() as row_num from (select unit, min(date_time) start_date, max(date_time) as end_time from table group by unit) a;

select a.unit, a.start_date as start_date, nvl(b.start_date, a.end_time) end_time from temptable a left outer join temptable b on (a.row_num+1) = b.row_num;
于 2017-03-21T12:38:21.437 回答