2

请参阅本说明末尾的重要新发现 1 和 2。

我正在运行 Postgres 9.1.3 并且遇到了一个奇怪的左连接问题。

我有一个名为consistent.master的表,其中包含超过200 万行。它有一个名为citation_id的列,并且该列没有空值。我可以用这个来验证:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

那返回0

这就是奇怪的地方:如果我将此表LEFT JOIN 加入临时表,我会收到一个错误,我试图在citation_id字段中插入一个空值:

错误:“citation_id”列中的空值违反非空约束 SQL 状态:23502

这是查询:

WITH stops AS (
    SELECT citation_id,
           rank() OVER (ORDER BY offense_timestamp,
                     defendant_dl,
                     offense_street_number,
                     offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction=1
)

INSERT INTO consistent.masternew (arrest_id, citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom)

SELECT stops.stop, master.citation_id, defendant_dl, defendant_dl_state, defendant_zip, defendant_race, defendant_sex, defendant_dob, vehicle_licenseplate, vehicle_licenseplate_state, vehicle_registration_expiration_date, vehicle_year, vehicle_make, vehicle_model, vehicle_color, offense_timestamp, offense_street_number, offense_street_name, offense_crossstreet_number, offense_crossstreet_name, offense_county, officer_id, offense_code, speed_alleged, speed_limit, work_zone, school_zone, offense_location, id, source, citing_jurisdiction, the_geom
FROM consistent.master LEFT JOIN stops
ON stops.citation_id = master.citation_id

我在这个问题上摸不着头脑。如果这是一个LEFT JOIN,并且如果一致.master是连接的左表,那么这个查询如何在没有任何开头的citation_id列中创建空值?

这是我用来创建表的 SQL 代码:

CREATE TABLE consistent.masternew
(
  arrest_id character varying(20),
  citation_id character varying(20) NOT NULL,
  defendant_dl character varying(20),
  defendant_dl_state character varying(2),
  defendant_zip character varying(9),
  defendant_race character varying(10),
  defendant_sex character(1),
  defendant_dob date,
  vehicle_licenseplate character varying(10),
  vehicle_licenseplate_state character(2),
  vehicle_registration_expiration_date date,
  vehicle_year integer,
  vehicle_make character varying(20),
  vehicle_model character varying(20),
  vehicle_color character varying,
  offense_timestamp timestamp without time zone,
  offense_street_number character varying(10),
  offense_street_name character varying(30),
  offense_crossstreet_number character varying(10),
  offense_crossstreet_name character varying(30),
  offense_county character varying(10),
  officer_id character varying(20),
  offense_code integer,
  speed_alleged integer,
  speed_limit integer,
  work_zone bit(1),
  school_zone bit(1),
  offense_location point,
  id serial NOT NULL,
  source character varying(20), -- Where this citation came from--court, PD, etc.
  citing_jurisdiction integer,
  the_geom geometry,
  CONSTRAINT masternew_pkey PRIMARY KEY (id ),
  CONSTRAINT citing_jurisdiction FOREIGN KEY (citing_jurisdiction)
      REFERENCES consistent.jurisdictions (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT offenses FOREIGN KEY (offense_code)
      REFERENCES consistent.offenses (id) MATCH SIMPLE
      ON UPDATE NO ACTION ON DELETE NO ACTION,
  CONSTRAINT enforce_dims_the_geom CHECK (st_ndims(the_geom) = 2),
  CONSTRAINT enforce_geotype_the_geom CHECK (geometrytype(the_geom) = 'POINT'::text OR the_geom IS NULL),
  CONSTRAINT enforce_srid_the_geom CHECK (st_srid(the_geom) = 3081)
)
WITH (
  OIDS=FALSE
);
ALTER TABLE consistent.masternew
  OWNER TO postgres;
COMMENT ON COLUMN consistent.masternew.source IS 'Where this citation came from--court, PD, etc.';

CREATE INDEX masternew_citation_id_idx
  ON consistent.masternew
  USING btree
  (citation_id COLLATE pg_catalog."default" );

CREATE INDEX masternew_citing_jurisdiction_idx
  ON consistent.masternew
  USING btree
  (citing_jurisdiction );

CREATE INDEX masternew_defendant_dl_idx
  ON consistent.masternew
  USING btree
  (defendant_dl COLLATE pg_catalog."default" );

CREATE INDEX masternew_id_idx
  ON consistent.masternew
  USING btree
  (id );

CREATE INDEX masternew_offense_street_name_idx
  ON consistent.masternew
  USING btree
  (offense_street_name COLLATE pg_catalog."default" );

CREATE INDEX masternew_offense_street_number_idx
  ON consistent.masternew
  USING btree
  (offense_street_number COLLATE pg_catalog."default" );

CREATE INDEX masternew_offense_timestamp_idx
  ON consistent.masternew
  USING btree
  (offense_timestamp );

CREATE INDEX masternew_the_geom_idx
  ON consistent.masternew
  USING gist
  (the_geom );

重要发现 1

我刚刚发现了一些有趣的事情。这个查询:

SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NOT NULL
UNION
SELECT COUNT(*)
FROM consistent.master
UNION
SELECT COUNT(*)
FROM consistent.master
WHERE citation_id IS NULL

结果是:

2085344
2085343
0

我怎么可能解释呢?计数如何WHERE citation_id IS NOT NULL可能高于没有WHERE子句的相同查询?

重要发现 2 好的,根据下面的评论,我发现我有一行包含所有空值,尽管该表有一个串行id列和一些NOT NULL约束。

我删除了流浪汉行。现在我没有收到空错误。相反,我得到了这个:

ERROR:  duplicate key value violates unique constraint "masternew_pkey"
DETAIL:  Key (id)=(1583804) already exists.

********** Error **********

ERROR: duplicate key value violates unique constraint "masternew_pkey"
SQL state: 23505
Detail: Key (id)=(1583804) already exists.

所以为了确保,我做这个查询:

SELECT COUNT(id)
FROM consistent.master
WHERE id=1583804;

你猜怎么了?consistent.master只有 1 个这样的实例!所以考虑到左表LEFT JOIN只有1个1583804的实例,citation_id 并且id列只能来自左表,怎么可能发生这个错误呢?这样的LEFT JOIN结果不应该导致最终结果的行数比左表多,对吧?

4

2 回答 2

3

使用 INSERT,尤其是复杂的 INSERT,您应该始终定义目标列。所以做那个:

插入一致的.masternew (citation_id, col1, col2, ...)

如果随附的 SELECT 语句出现任何问题 - 像这样:

the_geom geometry

(用类型名称重命名列是没有意义的 - 我认为这是无意的) - 或者如果基础表定义发生更改,则没有定义目标列的 INSERT 语句可能会出错。

PostgreSQL 不会强制 SELECT 语句中的列数与目标表中的列数相同。我引用了精美的手册

显式或隐式列列表中不存在的每一列都将填充一个默认值,要么是其声明的默认值,要么是null (如果没有)

(我的粗体强调。)如果您在列列表中存在不匹配,这可能会使 NULL 值“突然出现”。

此外, SELECT 语句中的顺序必须与要插入的列的顺序相匹配。如果目标列没有拼写出来,这将是创建表时列的顺序。
您似乎希望列自动按名称匹配,但事实并非如此。SELECT 语句中的列名与 INSERT 的最后一步完全无关。只有它们从左到右的顺序才有意义。

与其他人暗示的相反,WITH 子句是完全合法的。我在 INSERT 上引用手册:

查询(SELECT 语句)也可能包含 WITH 子句。在这种情况下,可以在查询中引用两组 with_query,但第二组优先,因为它嵌套更紧密。

您的陈述可能如下所示:

WITH stops AS (
    SELECT citation_id
          ,rank() OVER (ORDER BY
                    offense_timestamp
                   ,defendant_dl
                   ,offense_street_number
                   ,offense_street_name) AS stop
    FROM   consistent.master
    WHERE  citing_jurisdiction = 1
    )
INSERT INTO consistent.masternew (citation_id, col1, col2, ...) -- add columns
SELECT m.citation_id -- order colums accordingly!
      ,s.stop
      ,m.defendant_dl
        -- 27 more columns
      ,m.citing_jurisdiction
      ,m.the_geom
FROM   consistent.master m
LEFT   JOIN stops s USING (citation_id);
于 2012-03-15T04:36:48.523 回答
2

猜测一下,我会说您正在将可能为空的 stop.stop 插入 citation_id 列,但在不知道表结构的情况下我不能肯定地说:)

编辑:尝试@vol7ron 的建议并命名列...

于 2012-03-15T03:27:42.523 回答