0

当我在 MySQL 中运行以下查询时,我得到了很多重复项。我知道我已经足够清楚了,我只需要不同的记录,所以我不明白为什么它会为我加倍。当我包含最后一个联合(表)时,似乎所有重复项都会出现,importorders因为大多数客户在客户和订单中都有相同的地址。谁能帮我理解为什么会这样?

SELECT DISTINCT PostalCode, City, Region, Country
FROM 
(select distinct postalcode, city, region, country
from importemployees
UNION
select distinct postalcode, city, region, country
from importcustomers
UNION
select distinct postalcode, city, region, country
from importproducts
UNION
select distinct shippostalcode as postalcode, shipcity as city, shipregion as region, shipcountry as country
from importorders) T

查询和结果

如你看到的。有些行是重复的。

如果我使用INSERT IGNORE先插入importcustomersimportorders则它会设法将记录识别为重复项。为什么选择查询不起作用?

4

1 回答 1

2

非常好奇的问题。当我放弃“国家”时,它似乎解决了这个问题。

SELECT DISTINCT PostalCode, City, Region

总共 128 个,查询耗时 0.0066 秒

SELECT DISTINCT PostalCode, City, Region, Country

共 209 个,查询耗时 0.0002 秒

此外,该行为似乎只影响ImportCustomersand ImportOrders

SELECT postalcode, city, region, country
FROM 
    (SELECT postalcode, city, region, country FROM importcustomers
    UNION
    SELECT shippostalcode, shipcity, shipregion, shipcountry FROM importorders) t

总共 172 个,查询耗时 0.0053 秒

SELECT postalcode
FROM 
    (SELECT postalcode FROM importcustomers
    UNION
    SELECT shippostalcode FROM importorders) t

总共 91 个,查询耗时 0.0050 秒

然后我将其缩小到countryimportcusotmersimportorders

SELECT TRIM(country) AS country FROM importcustomers
UNION
SELECT TRIM(shipcountry) AS country FROM importorders
阿根廷
阿根廷
奥地利
奥地利
比利时
比利时
...

当我将专栏投到BINARY

SELECT BINARY country AS country FROM importcustomers
UNION
SELECT BINARY shipcountry AS country FROM importorders
阿根廷
417267656e74696e610d
奥地利
417573747269610d
比利时
42656c6769756d0d
...

该表ImportOrders导致重复。

 SELECT BINARY shipcountry AS country FROM importorders
4765726d616e790d
5553410d
5553410d
4765726d616e790d
...

查看您提供的转储,国家末尾附加了一个额外的\r(在值中表示为)。0d

--
-- 转储表 `importorders` 的数据
--
插入“进口订单”值
...'德国\r'),
...'美国\r'),
...'美国\r'),
...'德国\r'),
...'墨西哥\r'),

importcustomers在哪里country看起来不错:

--
-- 转储表 `importcustomers` 的数据
--
插入“importcustomers”值
...'德国', ... ,
...'墨西哥', ... ,
...'墨西哥', ... ,
...'英国', ... ,
...'瑞典',...,

您可以通过运行以下查询来删除这些\r(回车):

UPDATE importorders SET ShipCountry = REPLACE(ShipCountry, '\r', '')

然后,如果您运行原始查询,您将获得所需的结果集。仅供参考,DISTINCT如果您使用的是UNION.

SELECT PostalCode, City, Region, Country
FROM 
    (SELECT postalcode, city, region, country FROM importemployees
    UNION
    SELECT postalcode, city, region, country FROM importcustomers
    UNION
    SELECT postalcode, city, region, country FROM importproducts
    UNION
    SELECT shippostalcode as postalcode, shipcity as city, 
        shipregion as region, shipcountry as country FROM importorders) T
于 2012-09-12T18:40:55.523 回答