2

我正在使用 PHP 和 MySQL。谁能告诉我一种根据优先级过滤掉重复结果的有效方法?

例子:

我有一张桌子:

ID  |  Priority 1  |  Priority 2  |  Priority 3  |  E-Mail
--------------------------------------------------------------
1   |  Apple       |  One         |  Low         | abc@abc.com
2   |  Banana      |  Two         |  Medium      | def@abc.com
3   |  Banana      |  Two         |  High        | def@abc.com
4   |  Banana      |  Two         |  High        | def@abc.com
5   |  Peach       |  Three       |  Low         | ghi@abc.com
6   |  Peach       |  Four        |  High        | ghi@abc.com

在上面的示例中,我正在寻找一种仅获取第 1、3(或 4)和 6 行的方法。
也就是说,由于第 2、3、4 和 5,6 行的电子邮件相同,它们是重复的记录。我想根据优先级选择记录。
如果重复记录的优先级 1 相同,则转到优先级 2。如果也相同,则转到优先级 3。如果相同,则选择哪个都没有关系。
但是,如果有差异,我会选择更高优先级的记录。在上面的例子中,优先级是

Peach -> Banana -> Apple
Four -> Three -> Two -> One
High -> Medium -> Low

然后我会将结果插入到不同的数据库中。

到目前为止,我有一个查询来获取非重复项。我正在考虑进行第二个查询来处理重复项。
第一个查询处理大约 20,000 条记录。第二个查询将处理大约 5,000 条记录。

但是,我不确定实现这一目标的有效方法。

我将非常感谢任何帮助。

谢谢你。

编辑:错字:想要第 1、3/4 和 6 行(不是 1,2 和 6)

4

1 回答 1

0

此查询应为您提供所需的结果:

SELECT
  MIN(ID),
  EMail,
  MIN(Priority1),
  MIN(Priority2),
  MIN(Priority3)
FROM
  yourtable
WHERE
  (EMail, Priority1, Priority2, FIELD(Priority3, 'High', 'Medium', 'Low')) IN (
    SELECT
      EMail,
      MIN(Priority1),
      MIN(Priority2),
      MIN(FIELD(Priority3, 'High', 'Medium', 'Low')) MinP3
    FROM
      yourtable
    WHERE
      (EMail, Priority1, FIELD(Priority2, 'Four', 'Three', 'Two', 'One')) IN (
        SELECT
          EMail,
          MIN(Priority1),
          MIN(FIELD(Priority2, 'Four', 'Three', 'Two', 'One')) MinP2
        FROM
          yourtable
        WHERE
          (EMail, FIELD(Priority1, 'Peach', 'Banana', 'Apple')) IN
          (SELECT
             EMail, MIN(FIELD(Priority1, 'Peach', 'Banana', 'Apple')) MinP1
           FROM
             yourtable
           GROUP BY
            EMail)
        GROUP BY
          EMail)
    GROUP BY
      EMail)
GROUP BY
  EMail

(我返回第 3 行而不是第 2 行,但如果我正确理解您的问题,它应该是正确的)。请在此处查看小提琴。我怀疑它不会很快。我仍然想知道是否有办法让它更快。

编辑

您可以尝试以下查询。它使用不同的逻辑,但它也使用带有一些索引列的优先级表,它们应该比 FIELD 函数快得多,但是有许多连接可能会稍微减慢查询速度。

CREATE TABLE Priorities (
  Num INT,
  Des VARCHAR(10),
  Priority INT,
  PRIMARY KEY (Num, Des)
);

INSERT INTO Priorities VALUES
(1, 'Peach',  1),
(1, 'Banana', 2),
(1, 'Apple',  3),
(2, 'Four',   1),
(2, 'Three',  2),
(2, 'Two',    3),
(2, 'One',    4),
(3, 'High',   1),
(3, 'Medium', 2),
(3, 'Low',    3);

SELECT MIN(ID), yourtable.Email, MIN(Priority1) Priority1, MIN(Priority2) Priority2, MIN(Priority3) Priority3
FROM
  yourtable
  INNER JOIN Priorities p1 ON yourtable.Priority1=p1.Des AND p1.Num=1
  INNER JOIN Priorities p2 ON yourtable.Priority2=p2.Des AND p2.Num=2
  INNER JOIN Priorities p3 ON yourtable.Priority3=p3.Des AND p3.Num=3
  INNER JOIN (
    SELECT s1.EMail, MIN(MinP1) M1, MIN(MinP2) M2, MIN(MinP3) M3
    FROM (
      SELECT   EMail, MIN(p1.Priority) MinP1
      FROM     yourtable INNER JOIN Priorities p1
               ON yourtable.Priority1 = p1.Des AND p1.Num = 1
      GROUP BY EMail) s1
    INNER JOIN (
      SELECT   EMail, p1.Priority Pr1, MIN(p2.Priority) MinP2
      FROM     yourtable INNER JOIN Priorities p1
               ON yourtable.Priority1 = p1.Des AND p1.Num = 1
               INNER JOIN Priorities p2
               ON yourtable.Priority2 = p2.Des AND p2.Num = 2
      GROUP BY EMail, p1.Priority) s2
    ON s1.EMail=s2.EMail AND s1.MinP1=s2.Pr1
    INNER JOIN (
      SELECT   EMail, p1.Priority Pr1, p2.Priority Pr2, MIN(p3.Priority) MinP3
      FROM     yourtable INNER JOIN Priorities p1
               ON yourtable.Priority1 = p1.Des AND p1.Num = 1
               INNER JOIN Priorities p2
               ON yourtable.Priority2 = p2.Des AND p2.Num = 2
               INNER JOIN Priorities p3
               ON yourtable.Priority3 = p3.Des AND p3.Num = 3
      GROUP BY EMail, p1.Priority, p2.Priority) s3
    ON s1.Email=s3.Email AND s1.MinP1=s3.Pr1 AND s2.MinP2=s3.Pr2
  GROUP BY
    s1.EMail) s
  ON yourtable.EMail=s.Email
     AND p1.Priority=s.M1
     AND p2.Priority=s.M2
     AND p3.Priority=s.M3
GROUP BY
  yourtable.EMail

在此处查看小提琴。如果它仍然太慢,我们可以尝试将我的第一个查询与第二个支持表一起使用。或者我们应该将查询分成两部分。

于 2013-05-10T19:46:39.143 回答