sql - 在 Hive 中查找前 10 条热门推文

Question

我在 retweet_count 的基础上在 hive 中找到前 10 条热门推文，即 retweet_count 最高的推文将排在第一位，以此类推....

这是选举表的详细信息

id                      bigint                  from deserializer   
created_at              string                  from deserializer   
source                  string                  from deserializer   
favorited               boolean                 from deserializer   
retweeted_status        struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int>   from deserializer   
entities                struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> from deserializer   
text                    string                  from deserializer   
user                    struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string,location:string>    from deserializer   
in_reply_to_screen_name string                  from deserializer

我的查询

select text 
from election 
where retweeted_status.retweet_count IN  
     (select  retweeted_status.retweet_count as zz 
      from election  
      order by zz desc  
      limit 10);

它向我返回了 10 次相同的推文。（推文-ABC、推文-ABC、推文-ABC、...推文-ABC）

所以我所做的是打破嵌套查询，当我运行内部查询时

select  retweeted_status.retweet_count as zz 
from election  
order by zz desc  
limit 10

它返回 10 个不同的值 (1210,1209,1208,1207,1206,....1201)

之后当我运行我的外部查询时

select text 
from election  
where retweeted_status.retweet_count 
      IN  (1210,1209,1208,1207,1206,....1201 );

结果是相同的 10 条推文（TWEET-ABC、TWEET-ABC、TWEET-ABC、...... TWEET-ABC）

我的查询逻辑有什么问题？

score 0 · Accepted Answer

您应该使用 id 而不是使用计数。那是因为如果你有 100 条相同数量的推文，不管 LIMIT 10，你将获得 100 条记录。

select text 
from election 
where id  IN  
     (select  id as zz 
      from election  
      order by retweeted_status.retweet_count desc  
      limit 10);

但仍然不确定为什么会得到错误的结果。

编辑（在我的评论之后）：

如果我的评论是正确的，那么您将拥有相同的 id 十次。在这种情况下更改为

     (select distinct id as zz 
      from election  
      order by retweeted_status.retweet_count desc  
      limit 10);

sql - 在 Hive 中查找前 10 条热门推文

1 回答 1

Related

Reference