1

需要帮助在 SQL 数据库中查找相似值。表结构如:

    id         |        item_id_nm |      height |    width |     length |     weight
    ----------------------------------------------------------------------------------
    1          |       00000000001 |      1.0    |     1.0  |        1.0 |         1.0
    2          |       00000000001 |      1.1    |     1.0  |        0.9 |         1.1
    3          |       00000000001 |      2.0    |     1.0  |        1.0 |         1.0
    4          |       00000000002 |      1.0    |     1.0  |        1.0 |         1.0
    5          |       00000000002 |      1.0    |     1.1  |        1.1 |         1.0
    6          |       00000000002 |      1.0    |     1.0  |        1.0 |         2.0

id 显然不能有重复项, item_id_nm 可以有重复项(实际上可以发生多次,也就是 > 2)。

您将如何形成 SQL 以查找重复的 item_id_nm,但仅当高度或宽度或长度或重量的值相差 > 30% 时才查找重复的 item_id_nm。

我知道它需要遍历表,但是我该如何进行检查。谢谢您的帮助。

编辑:包括 %30 差异的示例。id = 3 与 id 1 和 2 的 1.0(或 1.1)有 200% 的高度差异。很抱歉不清楚,但高度、宽度、长度或重量的每个值都可能有 30% 的差异,并且如果其中一个有 30% 的差异,它将被视为与其他的重复。

4

4 回答 4

3

这应该为您提供与平均值相差 30% 或更多的行:

SELECT t1.*
FROM tbl t1
INNER JOIN (
    SELECT
         item_id_nm,
        AVG(width) awidth, AVG(height) aheight, 
        AVG(length) alength, AVG(weight) aweight
    FROM tbl
    GROUP BY item_id_nm ) t2
USING (item_id_nm)
WHERE 
    width > awidth * 1.3 OR width < awidth * 0.7
    OR height > aheight * 1.3 OR height < aheight * 0.7
    OR length > alength * 1.3 OR length < alength * 0.7
    OR weight > aweight * 1.3 OR weight < aweight * 0.7

这应该为您提供相差 30% 的行对:

SELECT t1.*,t2.*
FROM tbl t1
INNER JOIN tbl t2
USING (item_id_nm)
WHERE 
     (t1.width > t2.with * 1.3 OR t1.width < t2.width * 0.7)
    OR (t1.height > t2.height * 1.3 OR t1.height < t2.height * 0.7)
    OR (t1.length > t2.length * 1.3 OR t1.length < t2.length * 0.7)
    OR (t1.weight > t2.weight * 1.3 OR t1.weight < t2.weight * 0.7)
于 2013-04-04T21:37:29.367 回答
2

我认为你可以使用这样的东西:

SELECT item_id_nm
FROM yourtable
GROUP BY item_id_nm
HAVING
  MIN(height)*1.3 < MAX(height) OR
  MIN(width)*1.3 < MAX(width) OR
  MIN(length)*1.3 < MAX(length) OR
  MIN(weight)*1.3 < MAX(weight)
于 2013-04-04T21:27:30.267 回答
2
SELECT
    *
FROM
    TableName
WHERE
   (height > 1.3 * width OR height < 0.7 width) OR
   (length > 1.3 * width OR length < 0.7 width)
GROUP BY
    item_id_nm
HAVING
    COUNT(item_id_nm) > 1
于 2013-04-04T21:38:04.150 回答
0

我会使用:

SELECT s1.id AS id1, s2.id AS id2
, s1.height AS h1, s2.height as h2
, s1.width as width1, s2.width as width2
, s1.length as l1, s2.length as l2
, s1.weight as weight1, s2.weight as weight2
FROM stack s1
INNER JOIN stack s2
ON s1.item_id_nm = s2.item_id_nm
WHERE s1.id != s2.id
AND s1.id < s2.id
AND (abs(100-((s2.height*100)/s1.height)) > 30
OR abs(100-((s2.width*100)/s1.width)) > 30
OR abs(100-((s2.length*100)/s1.length)) > 30
OR abs(100-((s2.weight*100)/s1.weight)) > 30)

使用 PostgreSQL ( http://sqlfiddle.com/#!12/e5f25/15 )。此代码不会返回重复的行。

于 2013-04-04T23:26:26.053 回答