0

Given an array of a series of long numbers (equal length) - 0101101111, 0111101111, 0101101101, etc.

Find closest match to 0101101001

OK, that's the short version. The long version is a form with 11 Yes(1) and No(0) questions (product finder). Since we know that the database may not contain an exact match to the n^r solutions = 2048, we need to find the closest match or, possibly, the closest two or three matches.

I'm guessing we need to compare each number in each position and rank the result, but I'm stuck on the solution - if I am even going about it in the right direction.

Thanks!!

I did a quick check against popnoodle answer below... using $lookslike=11100110011; running foreach, generated sql and ran in phpMyAdmin against a table of 6 "answers".

Answers were ranked as follows:

Answer -- rank

11100110011 -- 11
11100110010 -- 10
11100110000 -- 9
00000111010 -- 6
00000111111 -- 6
01101100110 -- 6

*edit for ranking explanation - 11 of 11 matches, 10 of 11 matches, 9 of 11 matches, 6 of 11 matches...

Very nice.

Added modified PHP manual example of multiple choice answers:

<?php
// Client Answers
$input = 'abcdbcdabcd';

// array of answers to check against
$answers  = array('abcdbddabcd', 'abcbbcdabcd', 'abcdbcccccd');

// no shortest distance found, yet
$shortest = -1;

// loop through words to find the closest
foreach ($answers as $answer) {

    // calculate the distance between the input word,
    // and the current word
    $lev = levenshtein($input, $answer);

    // check for an exact match
    if ($lev == 0) {

        // closest word is this one (exact match)
        $closest = $answer;
        $shortest = 0;

        // break out of the loop; we've found an exact match
        break;
    }

    // if this distance is less than the next found shortest
    // distance, OR if a next shortest word has not yet been found
    if ($lev <= $shortest || $shortest < 0) {
        // set the closest match, and shortest distance
        $closest  = $answer;
        $shortest = $lev;
    }
}

echo "Input answer: $input\n" . "<br />";
if ($shortest == 0) {
    echo "Exact match found: $closest\n" . "<br />";
} else {
    echo "Possible Alternative: $closest?\n" . "<br />";
    echo "Levenshtein Distance: $lev\n";
}

?>

Returns:

Input answer: abcdbcdabcd
Possible Alternative: abcbbcdabcd?
Levenshtein Distance: 3

4

3 回答 3

0

我对 SQL 了解不多,所以这可能不是最巧妙的方法,但到目前为止还没有其他人做出贡献。

比较每个位置的每个字符并对结果进行排名。

rank = 给定比较字符串“0101101001”中与每行答案字符串中相同的字符数,所以(注意,您在问题中说的是 11,但给出了 10 个数字作为示例)

0101101001 against 0101101001 gives a rank of 10 
0101101001 against 1101101001 gives a rank of 9
0101101001 against 1111111001 gives a rank of 7
0101101001 against 0001100111 gives a rank of 6

作为 SQL

SELECT answers,
    ( #looking for 0101101001
    IF (substring(answers,1,1)=0, 1, 0) 
    + IF (substring(answers,2,1)=1, 1, 0)
    + IF (substring(answers,3,1)=0, 1, 0)
    + IF (substring(answers,4,1)=1, 1, 0)
    + IF (substring(answers,5,1)=1, 1, 0)
    + IF (substring(answers,6,1)=0, 1, 0)
    + IF (substring(answers,7,1)=1, 1, 0)
    + IF (substring(answers,8,1)=0, 1, 0)
    + IF (substring(answers,9,1)=0, 1, 0)
    + IF (substring(answers,10,1)=1, 1, 0)
    ) 
AS rank
FROM yesno 
ORDER BY 
    ( #looking for 0101101001
    IF (substring(answers,1,1)=0, 1, 0) 
    + IF (substring(answers,2,1)=1, 1, 0)
    + IF (substring(answers,3,1)=0, 1, 0)
    + IF (substring(answers,4,1)=1, 1, 0)
    + IF (substring(answers,5,1)=1, 1, 0)
    + IF (substring(answers,6,1)=0, 1, 0)
    + IF (substring(answers,7,1)=1, 1, 0)
    + IF (substring(answers,8,1)=0, 1, 0)
    + IF (substring(answers,9,1)=0, 1, 0)
    + IF (substring(answers,10,1)=1, 1, 0)
    ) 
DESC

作为 PHP

function getAnswers($lookslike)
{
    /* expects "binary" string
    returns the answers and rank (0 to string length) ordered by closest match first 
    */  

    foreach (str_split($lookslike) as $i=>$bit){
        $ifs[]='IF (substring(answers,' . ($i+1) . ',1)=' . $bit . ', 1, 0) ';
    }

    // use your db class
    return $db->select_many('
        SELECT answers,
            ( '. implode(' + ', $ifs) .' ) 
        AS rank
        FROM yesno 
        ORDER BY 
            ( '. implode(' + ', $ifs) .' ) 
        DESC
    ');
}

没有检查我的代码

于 2013-01-04T19:57:54.743 回答
0

Sounds like what you are looking for is called the Levenshtein distance.

It's fairly straightforward and not difficult to implement in whichever language you'd like.

Here is how to implement the function in MySQL

于 2013-01-07T02:18:26.463 回答
0

如果您将信息作为位掩码存储在整数字段中,我认为您可以通过一些位操作获得相同的结果:

SELECT value 
FROM TABLE
ORDER BY BIT_COUNT(~(value ^ test)) DESC

test 是您要测试的值。

一个例子是 test 是 11001 并且 value 是 10101:

  • 11001 异或 10101 = 01101
  • 二进制反转 (~) = 10010
  • 计算 1 的位数 ( BIT_COUNT) = 2
  • 并反转排序

实际上,我认为您可以将其缩短为:

SELECT value 
FROM TABLE
ORDER BY BIT_COUNT(value ^ test)

这本质上是在计算不同的位,因此升序排序应该返回您需要的内容。

于 2013-01-07T02:46:28.867 回答