7

我有一个 7 列的 mysql 表,每行包含整数值。

我有一个简单的站点,它接收来自用户的值,我必须尝试查看用户发送的值是否与表中的任何行匹配或相似。

所以用户写例如1 2 3 4 5 6 7作为输入。

我必须找出我的表中的任何行是否与它相似而没有顺序。如此等等1 2 3 4 5 6 7 = 7 6 5 4 3 2 1。该表包含 40,000 多行数据。

我还必须看看他们是否至少共享567共同的数字。

这意味着使用排列来找到所有可能的组合。但是,解决此类问题的最佳方法是什么?

  1. 从用户那里获取输入并获取所有排列并匹配第一行、第二行等,如果找到则报告?或者,相反,从表中获取一行并获取所有排列并根据用户输入进行匹配?

  2. 当经过这么多排列的大表时,内存和 CPU 使用情况如何?

4

3 回答 3

3

在完全规范化的模式中,这是一个单一的查询

假设您的 pk 表为:

create table T1 
( pk char (1), a1 int, a2 int, a3 int, a4 int, a5 int, a6 int, a7 int);

insert into T1 values 
('a',1,2,3,4,5,6,7),
('b',2,3,4,5,6,7,8),
('z',10,11,12,13,14,15,16);

这时,我们可以将数据归一化为:

select
   pk, 
   case a
    when 1 then a1
    when 2 then a2
    when 3 then a3
    when 4 then a4
    when 5 then a5
    when 6 then a6
    when 7 then a7
   end
   as v
from T1   
cross join 
   (select 1 as a from dual union all
    select 2 as a from dual union all
    select 3 as a from dual union all
    select 4 as a from dual union all
    select 5 as a from dual union all
    select 6 as a from dual union all
    select 7 as a from dual ) T2

在前面的查询中,很容易将您的要求与单个具有匹配:

select pk
from
(
select
   pk, 
   case a
    when 1 then a1
    when 2 then a2
    when 3 then a3
    when 4 then a4
    when 5 then a5
    when 6 then a6
    when 7 then a7
   end
   as v
from T1   
cross join 
   (select 1 as a from dual union all
    select 2 as a from dual union all
    select 3 as a from dual union all
    select 4 as a from dual union all
    select 5 as a from dual union all
    select 6 as a from dual union all
    select 7 as a from dual ) T2
) T
where
   T.v in ( 4,5,6,7,8,9,10)
group by pk
having                                           <-- The Having
   count( pk ) > 4

结果

| PK |
------
|  b |
于 2012-11-26T11:14:11.770 回答
1

a light method might be to add an additional field in your database, which is a numerically ordered version of all 7 fields combined.

eg. if the data in the database was 2 4 7 6 5 1 3 , the combination field would be 1234567

Then when comparing, sort the users response numerically and compare against the combination field in the database.

Depending on what you are doing, you could write your query like this

select * from table where combination like '12%' or combination like '123%' 

If you know what the minimum number of matching numbers needs to be , that would lighten up the query

To find out how similar what they wrote vs what is in the database. You could use the levenshtein PHP function: http://php.net/manual/en/function.levenshtein.php

$result = levenshtein($input,$combination);
于 2012-11-26T10:43:56.133 回答
0

恐怕您无法真正有效地对此类问题进行查询。

您可以像这样构建WHERE子句:

(`1` IN ARRAY(1,2,3,4,5,6,7) 
    AND `2` IN ARRAY(1,2,3,4,5,6,7)
    AND `3` IN ARRAY(1,2,3,4,5,6,7)
    AND `4` IN ARRAY(1,2,3,4,5,6,7)
    AND `5` IN ARRAY(1,2,3,4,5,6,7))
OR
(`1` IN ARRAY(1,2,3,4,5,6,7) 
    AND `2` IN ARRAY(1,2,3,4,5,6,7)
    AND `3` IN ARRAY(1,2,3,4,5,6,7)
    AND `4` IN ARRAY(1,2,3,4,5,6,7)
    AND `6` IN ARRAY(1,2,3,4,5,6,7))
-- Each combination

但那将是一个地狱般的条件。另一方面,您可以尝试使用以下组合:

首先检查列是否1包含信息:

IF( `1` IN ARRAY(1,2,3,4,5,6,7), 1, 0)

然后总结所有这些数据:

SELECT (
    IF( `1` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
    IF( `2` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
    IF( `3` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
    IF( `4` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
    IF( `5` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
    IF( `6` IN ARRAY(1,2,3,4,5,6,7), 1, 0) +
    IF( `7` IN ARRAY(1,2,3,4,5,6,7), 1, 0)
) AS `matches_cnt`
FROM t1
HAVING `matches_cnt` >= 5

这将遍历所有行并且条件非常复杂(因此床性能)。

您也可以尝试用二进制字符串替换值,例如:

1,2,7 = 01000011

然后计算检查记录与数据库之间的汉明距离,但这只会降低条件的复杂性,但需要迭代所有记录将保持不变。

在mysql中实现使用:

将替换第一部分:

SELECT (
    $MAX_NUMBER$ - BIT_COUNT( XOR( `binary_representation`, $DATA_FROM_USER$))
) AS `matches_cnt`
于 2012-11-26T10:46:07.920 回答