我在 retweet_count 的基础上在 hive 中找到前 10 条热门推文,即 retweet_count 最高的推文将排在第一位,以此类推....
这是选举表的详细信息
id bigint from deserializer
created_at string from deserializer
source string from deserializer
favorited boolean from deserializer
retweeted_status struct<text:string,user:struct<screen_name:string,name:string>,retweet_count:int> from deserializer
entities struct<urls:array<struct<expanded_url:string>>,user_mentions:array<struct<screen_name:string,name:string>>,hashtags:array<struct<text:string>>> from deserializer
text string from deserializer
user struct<screen_name:string,name:string,friends_count:int,followers_count:int,statuses_count:int,verified:boolean,utc_offset:int,time_zone:string,location:string> from deserializer
in_reply_to_screen_name string from deserializer
我的查询
select text
from election
where retweeted_status.retweet_count IN
(select retweeted_status.retweet_count as zz
from election
order by zz desc
limit 10);
它向我返回了 10 次相同的推文。(推文-ABC、推文-ABC、推文-ABC、...推文-ABC)
所以我所做的是打破嵌套查询,当我运行内部查询时
select retweeted_status.retweet_count as zz
from election
order by zz desc
limit 10
它返回 10 个不同的值 (1210,1209,1208,1207,1206,....1201)
之后当我运行我的外部查询时
select text
from election
where retweeted_status.retweet_count
IN (1210,1209,1208,1207,1206,....1201 );
结果是相同的 10 条推文(TWEET-ABC、TWEET-ABC、TWEET-ABC、...... TWEET-ABC)
我的查询逻辑有什么问题?