1

我正在编写一个查询来将用户及其各自的域分配给 IP 地址。任何 IP 地址都不能有重复的用户。

这是我到目前为止在 SQL Fiddle 中得到的内容:http ://sqlfiddle.com/#!2/39c51/2/0

我有一张表,其中包含所有(数十万个)当前任务。较小规模的示例如下所示:

mysql> select * from test.usermap;
+-------------+-------+-------------------+
| vip         | user  | domain            |
+-------------+-------+-------------------+
| 100.50.20.1 | joe   | joesdomain.com    |
| 100.50.20.1 | bob   | joesdomain.com    |
| 100.50.20.2 | tom   | domain2.com       |
| 100.50.20.2 | fred  | domain2.com       |
| 100.50.20.2 | sally | domain2.com       |
| 100.50.20.3 | admin | athriddomain.com  |
| 100.50.20.4 | admin | numberfour.com    |
| 100.50.20.3 | sally | fivewithsally.com |
| 100.50.20.4 | jim   | thesix.com        |
| 100.50.20.1 | admin | seven.com         |
| 100.50.20.1 | sally | seven.com         |
| 100.50.20.1 | sue   | seven.com         |
| 100.50.20.5 |       |                   |
| 100.50.20.6 |       |                   |
+-------------+-------+-------------------+
14 rows in set (0.00 sec)

我有另一个表,其中包含尚未分配的用户,同样是一个小规模示例:

mysql> select * from test.newusers;
+-------+-----------+
| user  | domain    |
+-------+-----------+
| jim   | eight.com |
| sally | eight.com |
| admin | nine.com  |
| james | ten.com   |
| jane  | ten.com   |
+-------+-----------+
5 rows in set (0.00 sec)

这里的想法是将所有在 8.com 下的用户分配给 .5,因为这是最早的 IP,它既没有“jim”也没有“sally”,然后将 Nine.com 分配给 .2,将 ten.com 分配给 .1,因为他们各自的用户冲突(或缺乏冲突)。

我正在寻找的结果如下所示:

+-------------+-------+-----------+
| vip         | user  | domain    |
+-------------+-------+-----------+
| 100.50.20.1 | james | ten.com   |
| 100.50.20.1 | jane  | ten.com   |
| 100.50.20.2 | admin | nine.com  |
| 100.50.20.5 | jim   | eight.com |
| 100.50.20.5 | sally | eight.com |
+-------------+-------+-----------+
5 rows in set (0.01 sec)

我可以使用相关子查询中的子查询来执行此操作,如下所示:

mysql> select  
(
    select vip 
    from test.usermap
    where vip not in
    (
        select distinct vip 
        from test.usermap  
        where user in
        (
            select user 
            from test.newusers 
            where domain = n.domain
        )
    )
    order by inet_aton(vip) asc
    limit 1
) as vip, n.user, n.domain 
from test.newusers n
order by inet_aton(vip) asc;
+-------------+-------+-----------+
| vip         | user  | domain    |
+-------------+-------+-----------+
| 100.50.20.1 | james | ten.com   |
| 100.50.20.1 | jane  | ten.com   |
| 100.50.20.2 | admin | nine.com  |
| 100.50.20.5 | jim   | eight.com |
| 100.50.20.5 | sally | eight.com |
+-------------+-------+-----------+
5 rows in set (0.00 sec)

但这是非常低效的,我的生产映射表和 newusers 表分别是 300k 和 50k 行,所以这是不可能的。

我试图通过使用联接而不是嵌套子查询来提高效率,所以我用联接替换了内部查询,并在 ON 子句中列出了外部查询的列,但这似乎是不可能的:

mysql> select 
(
    select distinct vip 
    from test.usermap u 
    join test.newusers r
        on r.domain = n.domain
        and r.user != u.user
    order by inet_aton(vip) asc limit 1
) as vip, n.user, n.domain
from test.newusers n;
ERROR 1054 (42S22): Unknown column 'n.domain' in 'on clause'
mysql> 

虽然查询本身的逻辑是有意义的,因为用它所代表的字符串常量替换外部查询引用可以正常工作:

mysql> select
(
    select distinct vip 
    from test.usermap u 
    join test.newusers r
        on r.domain = 'ten.com'
        and r.user != u.user
    order by inet_aton(vip) asc limit 1
) as vip, n.user, n.domain
from test.newusers n
where domain = 'ten.com';
+-------------+-------+---------+
| vip         | user  | domain  |
+-------------+-------+---------+
| 100.50.20.1 | james | ten.com |
| 100.50.20.1 | jane  | ten.com |
+-------------+-------+---------+
2 rows in set (0.00 sec)

我的问题是:有没有办法在内部查询的连接内引用外部查询中的列?如果没有,如果没有以低效的方式嵌套子查询,存在什么样的(如果有的话)替代方案?

同样,我在这里有一个小提琴:http ://sqlfiddle.com/#!2/39c51/2/0

4

1 回答 1

3

我不确定这会提高多少(如果有的话)效率,但是可以重写查询而无需嵌套多个子查询:

SELECT  INET_NTOA(MIN(INET_ATON(UserMap.VIP))) AS VIP,
        NewUsers.User, 
        NewUsers.Domain
FROM    NewUsers
        CROSS JOIN UserMap
        LEFT JOIN
        (   SELECT  u.Domain, m.VIP
            FROM    NewUsers u
                    INNER JOIN UserMap m
                        ON u.User = m.User
        ) ex
            ON ex.Domain = NewUsers.Domain
            AND ex.VIP = UserMap.VIP
WHERE   ex.Domain IS NULL
GROUP BY NewUsers.User, NewUsers.Domain
ORDER BY VIP ASC;   

SQL Fiddle 上的示例

附录

上面的查询不会返回没有可用 VIP 的行,例如,如果100.50.20.5100.50.20.1UserMap您的查询中删除将返回:

VIP             USER    DOMAIN
NULL            jim     eight.com
NULL            sally   eight.com
100.50.20.1     james   ten.com
100.50.20.1     jane    ten.com
100.50.20.2     admin   nine.com

而我写的查询只会返回 VIP 不为空的行:

VIP             USER    DOMAIN
100.50.20.1     james   ten.com
100.50.20.1     jane    ten.com
100.50.20.2     admin   nine.com

要解决这个问题,您可以使用 UNION:

SELECT  INET_NTOA(MIN(INET_ATON(a.VIP))) AS VIP,
        a.User, 
        a.Domain
FROM    (   SELECT  UserMap.VIP,
                    NewUsers.User, 
                    NewUsers.Domain
            FROM    NewUsers
                    CROSS JOIN UserMap
                    LEFT JOIN
                    (   SELECT  u.Domain, m.VIP
                        FROM    NewUsers u
                                INNER JOIN UserMap m
                                    ON u.User = m.User
                    ) ex
                        ON ex.Domain = NewUsers.Domain
                        AND ex.VIP = UserMap.VIP
            WHERE   ex.Domain IS NULL
            UNION ALL
            SELECT  NULL AS VIP,
                    NewUsers.User,
                    NewUsers.Domain
            FROM    NewUsers
        ) a
GROUP BY a.User, a.Domain
ORDER BY VIP ASC;

SQL Fiddle 的修订示例

我不确定您在处理没有可用 VIP 的情况下的逻辑是什么,因此无法真正建议这部分的解决方案。但是您可以使用以下方法获得下一个 VIP:

SELECT  INET_NTOA(MAX(INET_ATON(UserMap.VIP)) + 1) AS NextVIP
FROM    UserMap

您的问题的另一个问题是 NewUsers 中的冲突,例如,如果您的 NewUsers 表包含这些记录:

('jim','eight.com'),
('sally','eight.com'),
('jim','eleven.com'),
('sally','eleven.com');

您的查询和我的查询都会将所有这些分配给 VIP 100.50.20.5。如果这可能发生,我认为解决此问题的最佳方法是在任何时候仅插入来自一个域的用户名。但它可以只使用 JOIN 来完成:

为了简化查询,我创建了 2 个视图

CREATE VIEW UsedVIP
AS
    SELECT  u.Domain, m.VIP
    FROM    NewUsers u
            INNER JOIN UserMap m
                ON u.User = m.User;

CREATE VIEW NewUserMap 
AS
    SELECT  UserMap.VIP,
            NewUsers.User, 
            NewUsers.Domain
    FROM    NewUsers
            CROSS JOIN UserMap
            LEFT JOIN UsedVIP ex
                ON ex.Domain = NewUsers.Domain
                AND ex.VIP = UserMap.VIP
    WHERE   ex.Domain IS NULL;

最后的查询是:

SELECT  INET_NTOA(MIN(INET_ATON(a.VIP))) AS VIP,
        a.User, 
        a.Domain
FROM    NewUserMap a
        LEFT JOIN NewUserMap b
            ON a.User = b.user
            AND a.VIP = b.VIP
            AND a.Domain > b.domain
        LEFT JOIN NewUserMap c
            ON a.User = c.user
            AND b.Domain = c.domain
            AND b.VIP < c.VIP
WHERE   c.user IS NULL
GROUP BY a.User, a.Domain
ORDER BY VIP ASC;

返回:

VIP             USER    DOMAIN
100.50.20.1     jane    ten.com
100.50.20.1     james   ten.com
100.50.20.2     admin   nine.com
100.50.20.5     sally   eight.com
100.50.20.5     jim     eight.com
100.50.20.6     jim     eleven.com
100.50.20.6     sally   eleven.com

SQL Fiddle 示例

于 2013-05-09T22:33:22.710 回答