2

我有一张如下表

id| page |   text
------------------------
1 | page1 | Hello World
2 | page1 | Foo Bar
3 | page2 | Baz Baz
3 | page2 | Some Text
4 | page3 | Some Other Text

我想选择 2 个随机条目 - 但每个页面只允许在结果中出现一次。

我有

SELECT * FROM mydata ORDER BY RANDOM(); 限制 2

但是我可以将它与DISTINCT或分组结合起来吗?

4

4 回答 4

2

就像是:

select id, page, text
from (
  select id, page, text,
         row_number() over (partition by page order by random()) as rn
  from mydata
) 
where rn <= 2
于 2012-08-24T21:04:55.443 回答
1

与欧文的回答相同,只是有点结构化:http ://www.sqlfiddle.com/#!1/d3e83/6

with first_random as
(
  select * from tbl order by random() limit 1
)
, second_random as
(
  select * 
  from tbl 
  where page <> (select page from first_random)
  order by random() limit 1
)
select * from first_random
union
select * from second_random;

与 a_horse_with_no_name 的答案相同,但这是正确的:http ://www.sqlfiddle.com/#!1/d3e83/12

select id, page, text, rn
from (
  select id, page, text,
         row_number() over (partition by page order by random()) as rn
  from tbl
) x
where rn = 1
order by random() 
limit 2;

选择后者,执行计划更简单

于 2012-08-25T02:16:22.810 回答
1

如果您想要:
... 来自基表的总共两行
... 并让每个页面都有相同的机会出现在示例中,无论它在表中有多少条目:

SELECT *
FROM  (
    SELECT DISTINCT ON (page) *
    FROM   mydata
    ORDER  BY page, random() -- pick one random entry per page
    ) x
ORDER BY random() -- pick two random pages
LIMIT 2;

或者,使用窗口函数:

WITH x AS (
   SELECT *, row_number() OVER (PARTITION BY page ORDER BY random()) AS rn
   FROM   mydata
   )
SELECT id, page, text
FROM   x
WHERE  rn = 1
ORDER  BY random()
LIMIT  2;

您必须测试哪个更快。
如果你正在处理一个大表并且需要快速的性能,你可以做得更好。这是一种方法。


另一方面,如果您想要:
... 表中总共有两行mydata
... 并给每个条目(几乎)相等的机会a 出现在样本中,有效地为具有更多条目的页面提供更好的机会在表中。
机会仍然不是真正相等的 - 根据定义,您的限制增加了稀有页面条目的机会。

WITH x AS (
   SELECT *
   FROM   mydata
   ORDER  BY random()
   LIMIT 1
   )
SELECT * FROM x
UNION ALL
(
SELECT m.*
FROM mydata m
   , x
WHERE m.page <> x.page -- assuming page IS NOT NULL
ORDER BY random()
LIMIT 1
);

第二个周围的括号SELECTUNION允许单独订购的。
使用 PostgreSQL 9.1 测试。窗口函数需要 8.4 或更高版本。

于 2012-08-25T02:03:59.363 回答
0

这可能有效:

SELECT * FROM
  (SELECT * FROM mydata GROUP BY page) t
ORDER BY RANDOM() LIMIT 2
于 2012-08-27T09:43:00.950 回答