我正在编写一个查询来将用户及其各自的域分配给 IP 地址。任何 IP 地址都不能有重复的用户。
这是我到目前为止在 SQL Fiddle 中得到的内容:http ://sqlfiddle.com/#!2/39c51/2/0
我有一张表,其中包含所有(数十万个)当前任务。较小规模的示例如下所示:
mysql> select * from test.usermap;
+-------------+-------+-------------------+
| vip | user | domain |
+-------------+-------+-------------------+
| 100.50.20.1 | joe | joesdomain.com |
| 100.50.20.1 | bob | joesdomain.com |
| 100.50.20.2 | tom | domain2.com |
| 100.50.20.2 | fred | domain2.com |
| 100.50.20.2 | sally | domain2.com |
| 100.50.20.3 | admin | athriddomain.com |
| 100.50.20.4 | admin | numberfour.com |
| 100.50.20.3 | sally | fivewithsally.com |
| 100.50.20.4 | jim | thesix.com |
| 100.50.20.1 | admin | seven.com |
| 100.50.20.1 | sally | seven.com |
| 100.50.20.1 | sue | seven.com |
| 100.50.20.5 | | |
| 100.50.20.6 | | |
+-------------+-------+-------------------+
14 rows in set (0.00 sec)
我有另一个表,其中包含尚未分配的用户,同样是一个小规模示例:
mysql> select * from test.newusers;
+-------+-----------+
| user | domain |
+-------+-----------+
| jim | eight.com |
| sally | eight.com |
| admin | nine.com |
| james | ten.com |
| jane | ten.com |
+-------+-----------+
5 rows in set (0.00 sec)
这里的想法是将所有在 8.com 下的用户分配给 .5,因为这是最早的 IP,它既没有“jim”也没有“sally”,然后将 Nine.com 分配给 .2,将 ten.com 分配给 .1,因为他们各自的用户冲突(或缺乏冲突)。
我正在寻找的结果如下所示:
+-------------+-------+-----------+
| vip | user | domain |
+-------------+-------+-----------+
| 100.50.20.1 | james | ten.com |
| 100.50.20.1 | jane | ten.com |
| 100.50.20.2 | admin | nine.com |
| 100.50.20.5 | jim | eight.com |
| 100.50.20.5 | sally | eight.com |
+-------------+-------+-----------+
5 rows in set (0.01 sec)
我可以使用相关子查询中的子查询来执行此操作,如下所示:
mysql> select
(
select vip
from test.usermap
where vip not in
(
select distinct vip
from test.usermap
where user in
(
select user
from test.newusers
where domain = n.domain
)
)
order by inet_aton(vip) asc
limit 1
) as vip, n.user, n.domain
from test.newusers n
order by inet_aton(vip) asc;
+-------------+-------+-----------+
| vip | user | domain |
+-------------+-------+-----------+
| 100.50.20.1 | james | ten.com |
| 100.50.20.1 | jane | ten.com |
| 100.50.20.2 | admin | nine.com |
| 100.50.20.5 | jim | eight.com |
| 100.50.20.5 | sally | eight.com |
+-------------+-------+-----------+
5 rows in set (0.00 sec)
但这是非常低效的,我的生产映射表和 newusers 表分别是 300k 和 50k 行,所以这是不可能的。
我试图通过使用联接而不是嵌套子查询来提高效率,所以我用联接替换了内部查询,并在 ON 子句中列出了外部查询的列,但这似乎是不可能的:
mysql> select
(
select distinct vip
from test.usermap u
join test.newusers r
on r.domain = n.domain
and r.user != u.user
order by inet_aton(vip) asc limit 1
) as vip, n.user, n.domain
from test.newusers n;
ERROR 1054 (42S22): Unknown column 'n.domain' in 'on clause'
mysql>
虽然查询本身的逻辑是有意义的,因为用它所代表的字符串常量替换外部查询引用可以正常工作:
mysql> select
(
select distinct vip
from test.usermap u
join test.newusers r
on r.domain = 'ten.com'
and r.user != u.user
order by inet_aton(vip) asc limit 1
) as vip, n.user, n.domain
from test.newusers n
where domain = 'ten.com';
+-------------+-------+---------+
| vip | user | domain |
+-------------+-------+---------+
| 100.50.20.1 | james | ten.com |
| 100.50.20.1 | jane | ten.com |
+-------------+-------+---------+
2 rows in set (0.00 sec)
我的问题是:有没有办法在内部查询的连接内引用外部查询中的列?如果没有,如果没有以低效的方式嵌套子查询,存在什么样的(如果有的话)替代方案?
同样,我在这里有一个小提琴:http ://sqlfiddle.com/#!2/39c51/2/0