mysql - De-duplicate smart random row query?

Question

following some research, I found a query suitable for my needs: it randomly returns IDs from the table. The ID field is an auto-increment, so there are no holes.

SELECT `mydb`.`myTable`.id
FROM   (SELECT Floor (Rand() * (SELECT Count(*) 
                                FROM   `mydb`.`myTable`)) num, 
               @num := @num + 1 
        FROM   (SELECT @num := 0) a, 
               `mydb`.`myTable` 
        LIMIT  2000000) b, 
       `mydb`.`myTable` 
WHERE  b.num = `mydb`.`myTable`.id

The issue I experience is that the target table (myTable) contains 30-400M records, depending on the situation. In the LIMIT, I want to retrieve 2M randomly selected IDs, however I get a lot of duplicates (which is expected).

Is it possible to de-duplicate the query and yet receive 2M records? I thought to create a table and let it manage the UNIQUE values, but again I will get less than expected.

Any thoughts? Many thanks!

score 1 · Accepted Answer

您可以简单地随机排序您的行。比没有重复，你是否有洞也没关系。

SELECT 
   id
FROM
  mydb.myTable
ORDER BY
  RAND()
LIMIT 2000000

mysql - De-duplicate smart random row query?

1 回答 1

Related

Reference