1

为了从表中获取记录,我使用这个 mysql 查询:

SELECT 
    a.id as aid, a.data1 as adata1, a.data2 as adata2
    b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id ) 
WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100
ORDER BY RAND() 
LIMIT 1

这个查询准确地获取了我需要的记录,但不幸的RAND()是因为这个查询很慢。

我找到了一些方法,如何避免使用RAND()函数,例如这里。但我的问题是,我仍然找不到方法,如何替换RAND()此查询中的函数。在一些简单的查询中替换是没有问题的RAND(),但我不知道,如何在上面的例子中做到这一点......因为WHERE子句中有更多的条件。

4

3 回答 3

1

由于您使用的是 MySQL,您可以尝试使用以下 SQL 查询,首先从表中获取计数,然后根据该计数选择随机偏移量。然后它准备一个语句,以便可以使用计算的偏移量并执行该语句。

SELECT @count := COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100;
SET @offset = CONVERT(FLOOR(RAND() * @count), SIGNED);
PREPARE mystatement FROM "SELECT 
                          a.id as aid, a.data1 as adata1, a.data2 as adata2
                          b.id as bid, b.data1 as bdata1, b.data2 as bdata2
                          FROM table AS a
                          JOIN table AS b ON ( a.id <> b.id ) 
                          WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT ?, 1";
EXECUTE mystatement USING @offset;
DEALLOCATE PREPARE mystatement;

在大型数据集上的执行速度应该比 快ORDER BY RAND(),请尝试让我知道... ;-)

编辑

查询将无法在 phpmyadmin 上使用,因此请使用 MySQL 控制台运行它们或编写一个 php 脚本,其中有两个选项,第一个是让 mysql 完成工作:

mysql_query('SELECT @count := COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100');
mysql_query('SET @offset = CONVERT(FLOOR(RAND() * @count), SIGNED)');
mysql_query('PREPARE mystatement FROM "SELECT 
                          a.id as aid, a.data1 as adata1, a.data2 as adata2
                          b.id as bid, b.data1 as bdata1, b.data2 as bdata2
                          FROM table AS a
                          JOIN table AS b ON ( a.id <> b.id ) 
                          WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT ?, 1"');
$res = mysql_query('EXECUTE mystatement USING @offset');
$row = mysql_fetch_assoc($res);
print_r($row);

第二个可能更快的选择包括使用 MySQL 完成一部分工作,而另一部分使用编程语言(在我们的例子中是 PHP):

$res = mysql_query("SELECT COUNT(*) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100')");
$row = mysql_fetch_array($res);
$offset = rand(0, $row[0]-1);

$res = mysql_query("SELECT 
                              a.id as aid, a.data1 as adata1, a.data2 as adata2
                              b.id as bid, b.data1 as bdata1, b.data2 as bdata2
                              FROM table AS a
                              JOIN table AS b ON ( a.id <> b.id ) 
                              WHERE (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100 LIMIT $offset, 1");
$row = mysql_fetch_assoc($res);

我发现的另一种加快 ORDER BY RAND() 的替代方法包括如下查询:

SELECT 
    a.id as aid, a.data1 as adata1, a.data2 as adata2
    b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a
JOIN table AS b ON ( a.id <> b.id ) 
WHERE (RAND() < (SELECT ((1/COUNT(*))*10) FROM table AS a JOIN table AS b ON ( a.id <> b.id ) ) )
 AND (a.data1=1 AND b.data1=1) AND ABS( a.rating - b.rating ) <100
ORDER BY RAND() 
LIMIT 1

不要忘记告诉我你得到的结果;-)。

于 2012-09-22T15:17:19.150 回答
1

你的问题不是很具体。. . 桌子有多大?究竟什么是“相当慢”?您正在尝试查找表中的所有记录对,其中 data1 = 1 并且评级差异小于 100。在以下版本中,我将所有条件移至“ON”子句,因此它们更清楚地放在一起:

SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
       b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM table AS a join
     table AS b
     ON a.id <> b.id and
        a.data1 = b.data1 and
        a.data1 = 1 and b.data1 = 1 and
        ABS( a.rating - b.rating ) < 100
ORDER BY RAND() 
LIMIT 1

我还添加了附加条件a.data1 = b.data1,因为这有助于 SQL 引擎将其识别为等值连接,这应该有助于连接性能。

假设 data1 是选择性的(意味着相对较少的记录有 data1),那么您应该能够使用 (data1, id) 或 (data1, rating) 上的索引来加快速度。

如果您知道每条记录至少有一个匹配项(即,每条记录都有另一条具有相似评级的记录),那么以下变体应该会更好:

SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
       b.id as bid, b.data1 as bdata1, b.data2 as bdata2
FROM (select *
      from table AS a
      where a.data1 = 1
      order by rand()
      limit 1
     ) a join
     table AS b
     ON a.id <> b.id and
        a.data1 = b.data1 and
        a.data1 = 1 and b.data1 = 1 and
        ABS( a.rating - b.rating ) < 100
ORDER BY RAND() 
LIMIT 1

这首先选择一个随机记录,然后进行自连接。

这给了我一个想法,你可以采取不同的方法来解决这个问题,如下所示。首先计算您正在查看的数据的评级。然后选择差值小于 100 的随机评分对,然后找到与这些评分匹配的随机记录。使用 data1 和 rating 的索引,这种方法可能是最快的。

于 2012-09-22T15:22:42.663 回答
0

如果您对问题空间的分布不太均匀感到满意,您可以尝试:

SELECT a.id as aid, a.data1 as adata1, a.data2 as adata2
       b.id as bid, b.data1 as bdata1, b.data2 as bdata2
  FROM ( SELECT *
           FROM table
          WHERE data1 = 1
          ORDER
             BY RAND()
          LIMIT 1
       ) a
  JOIN table b
    ON b.data1 = 1
   AND b.rating BETWEEN a.rating - 100 AND a.rating + 100
 ORDER
    BY RAND()
 LIMIT 1
;

以上将随机选择一条记录a,然后随机选择一条记录b。因此,要订购和加入的记录要少得多。这不太统一,因为这意味着 的所有选择的可能性都是a相等的,而不是与 的可能对应选择的数量成正比b,但也许它对您的目的来说已经足够了?

于 2012-09-22T15:21:06.387 回答