postgresql - Postgres LEFT JOIN 创建的行数多于左表

Question

我在 Windows 7 x64 上运行 Postgres 9.1.3 32 位。（必须使用 32 位，因为没有与 64 位 Postgres 兼容的 Windows PostGIS 版本。）（编辑：从 PostGIS 2.0 开始，它与 Windows 上的 Postgres 64 位兼容。）

我有一个查询将一个表 ( consistent.master) 与一个临时表连接起来，然后将结果数据插入到第三个表 ( consistent.masternew) 中。

由于这是 a left join，因此结果表应该与查询中的左表具有相同的行数。但是，如果我运行这个：

SELECT count(*)
FROM consistent.master

我明白了2085343。但是如果我运行这个：

SELECT count(*)
FROM consistent.masternew

我明白了2085703。

怎么能masternew有更多的行数master？不应该与查询中的左表masternew具有相同的行数吗？master

下面是查询。和表应该具有相同的结构master。masternew

--temporary table created here
--I am trying to locate where multiple tickets were written on
--a single traffic stop
WITH stops AS (
    SELECT citation_id,
           rank() OVER (ORDER BY offense_timestamp,
                     defendant_dl,
                     offense_street_number,
                     offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction=1
)

--Here's the insert statement. Below you'll see it's
--pulling data from a select query
INSERT INTO consistent.masternew (arrest_id,
  citation_id,
  defendant_dl,
  defendant_dl_state,
  defendant_zip,
  defendant_race,
  defendant_sex,
  defendant_dob,
  vehicle_licenseplate,
  vehicle_licenseplate_state,
  vehicle_registration_expiration_date,
  vehicle_year,
  vehicle_make,
  vehicle_model,
  vehicle_color,
  offense_timestamp,
  offense_street_number,
  offense_street_name,
  offense_crossstreet_number,
  offense_crossstreet_name,
  offense_county,
  officer_id,
  offense_code,
  speed_alleged,
  speed_limit,
  work_zone,
  school_zone,
  offense_location,
  source,
  citing_jurisdiction,
  the_geom)

--Here's the select query that the insert statement is using.    
SELECT stops.stop,
  master.citation_id,
  defendant_dl,
  defendant_dl_state,
  defendant_zip,
  defendant_race,
  defendant_sex,
  defendant_dob,
  vehicle_licenseplate,
  vehicle_licenseplate_state,
  vehicle_registration_expiration_date,
  vehicle_year,
  vehicle_make,
  vehicle_model,
  vehicle_color,
  offense_timestamp,
  offense_street_number,
  offense_street_name,
  offense_crossstreet_number,
  offense_crossstreet_name,
  offense_county,
  officer_id,
  offense_code,
  speed_alleged,
  speed_limit,
  work_zone,
  school_zone,
  offense_location,
  source,
  citing_jurisdiction,
  the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id

万一这很重要，我已经运行了一个VACUUM FULL ANALYZE并重新索引了两个表。（不确定确切的命令；通过 pgAdmin III 完成。）

score 11 · Accepted Answer

左连接的行数不一定与左表中的行数相同。基本上，它就像一个普通连接，除了左表中不会出现在普通连接中的行也被添加了。因此，如果右表中有不止一行与左表中的一行匹配，则结果中的行数可能会多于左表的行数。

为了做你想做的事，你应该使用 group by 和 count 来检测倍数。

select citation_id
from stops join master on stops.citation_id = master.citation_id
group by citation_id
having count(*) > 1

score 4 · Accepted Answer

有时你知道有多个，但不在乎。您只想获取第一个或顶部条目。
如果是这样，您可以使用SELECT DISTINCT ON：

FROM consistent.master LEFT JOIN (SELECT DISTINCT ON (citation_id) * FROM stops) s
ON s.citation_id = master.citation_id

citation_id您要为每个匹配获取第一（任何）行的列在哪里。

您可能希望确保这是确定性的，并ORDER BY与其他一些可排序的列一起使用：

SELECT DISTINCT ON (citation_id) * FROM stops ORDER BY citation_id, created_at

postgresql - Postgres LEFT JOIN 创建的行数多于左表

2 回答 2

Related

Reference