5

I have this table,

person_id   int(10) pk
points      int(6) index
other columns not very important

I have this random function which is very fast on a table with 10M rows:

SELECT person_id
  FROM persons AS r1 JOIN
       (SELECT (RAND() *
                     (SELECT MAX(person_id)
                        FROM persons)) AS id)
        AS r2
 WHERE r1.person_id >= r2.id
 ORDER BY r1.person_id ASC
 LIMIT 1

This is all great but now I wish to show only people with points > 0. Example table:

PERSON_ID      POINTS
1              4
2              6
3              0
4              3

When I append AND points > 0 to the where clause, person_id 3 can't be selected, so a gap is created and when the random select person_id 3, person_id 4 will be selected. This gives person 4 a bigger chance to be chosen. Any one got suggestions how I can adjust the query to make it work with the where clause and give all rows same % of chance to be selected.

Info table: The table is uniform, no gaps in person_id's. About 90% will have 0 points. I want to make the query for where points = 0 and points > 0.

Before someone will say, use rand(): this is not solution for tables with more than a few 100k rows.

Bonus question: will it be possible to select x random rows in 1 query, so I do not have to call this query a few times when i want more random rows?

Important note: performance is key, with 10M+ rows the query may not take much longer than the current query, which takes 0.0005 seconds, I prefer to stay under 0.05 second.

Last note: If you think the query will never be this fast with above requirements, but another solution is possible (like fetching 100 rows and showing x random which has more than 0 points), please tell :)

Really appreciate your help and all help is welcome :)

4

1 回答 1

1

您可以为您真正想要使用的记录生成内联无间隙 id,然后使用可用记录的总数生成随机选择器。

试试这个(此处为 row_number 生成器选择答案的道具):

    SELECT r1.*
    FROM
        (SELECT  person_id,
                 @curRow := @curRow + 1 AS row_number
        FROM persons as p,
             (SELECT @curRow := 0) r0
        WHERE points>0) r1
    , (SELECT COUNT(1) * RAND() id
       FROM persons
       WHERE points>0) r2
    WHERE r1.person_id>=r2.id
    ORDER BY r1.person_id ASC
    LIMIT 1;

你可以在这个 sqlfiddle中弄乱它。

于 2013-05-22T13:53:25.110 回答