4

My objective

I am trying to retrieve multiple random rows that contain only unique userid but for the type column to be random - type can only be 0 or 1. The table in question will contain less than 1,000 rows at any given time.

My table

CREATE TABLE tbl_message_queue (
  userid bigint(20) NOT NULL,
  messageid varchar(20) NOT NULL,
  `type` int(1) NOT NULL,
  PRIMARY KEY (userid,messageid,`type`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Sample data

userid  |   messageid   |   type
---------------------------------------------------
4353453 |   518423942   |   0
4353453 |   518423942   |   1
2342934 |   748475435   |   0
2342934 |   748475435   |   1
7657529 |   821516543   |   0
7657529 |   821516543   |   1
0823546 |   932843285   |   0
0823546 |   932843285   |   1

What to rule out

Using ORDER BY RAND() isn't feasible as over at least 18,000 of these types of queries are executed by applications at any given moment and are causing high load. Using SELECT DISTINCT or GROUP BY is (obviously) more efficient and will always pick unique userid but type will always equal to 0 with an acceptable load.

The common method is to create an id column but I'm looking for an alternative way only. The group primary key cannot change as it is required and deeply integrated into our application, however the structure of each column can be altered.

Thanks.

4

1 回答 1

2

My understanding of your question is that for each userid you have two entries, but want to extract only one, at random.

To achieve this, you ought to generate a random value between 0 and 1 for each unique userid, and then JOIN this list with the starting list:

SELECT a.* FROM tbl_message_queue AS a
    JOIN ( SELECT userid, FLOOR(2*RAND()) AS type 
           FROM tbl_message_queue GROUP BY userid ) AS b
    ON ( a.userid = b.userid AND a.type = b.type );

But if an ORDER BY RAND() does not work for you, maybe we should compromise.

In the above sequence, any two userids will be uncorrelated -- i.e., the fact that user A gets type 0 tells you nothing about what user B will turn up with.

Depending on the use case, a less random (but "apparently random") sequence could be obtained with two queries:

SELECT @X := FLOOR(2*RAND()), @Y := POW(2,FLOOR(2+14*RAND()))-1;

SELECT * FROM tbl_message_queue WHERE (((userid % @Y) & 1) XOR type XOR @X);

This way, you can get what seems a random extraction. What really happens is that the userids are correlated, and you only have some couple dozens different extractions possible. But using only simple operators, and no JOINs, this query is very fast.

于 2012-10-01T00:23:40.823 回答