级联新手,试图找到一种基于排序/顺序获取前 N 个元组的方法。例如,我想知道人们使用的前 100 个名字。
这是我可以在 teradata sql 中做的类似操作:
select top 100 first_name, num_records
from
(select first_name, count(1) as num_records
from table_1
group by first_name) a
order by num_records DESC
hadoop pig 也有类似的情况
a = load 'table_1' as (first_name:chararray, last_name:chararray);
b = foreach (group a by first_name) generate group as first_name, COUNT(a) as num_records;
c = order b by num_records DESC;
d = limit c 100;
在 SQL 或 Pig 中似乎很容易做到,但很难尝试在级联中找到一种方法。请指教!