0

下面的列表是指婚前和婚后的人名。随着时间的推移,他们中的一些人离婚并再次结婚和/或改名。
我想要做的是获取该人一生中的所有姓名,并为每个人添加一个具有唯一标识符的新列。

这是名为 Names 的实际列表:

Name_before                  Name_after         
Misti Gulick                 Misti Gulick Thibodeaux            
Faye Leaton                  Faye Leaton Hemby          
Arden Peck                   Arden Peck Mroz            
Carlton Kingsley             Carlton Kingsley Mcelveen          
Dolly Verhey                 Dolly Verhey Irish             
Gaynell Pasquale             Gaynell Pasquale Ayala             
Misti Gulick Thibodeaux      Misti Thibodeaux           
Faye Leaton Hemby            Faye Hemby         
Arden Peck Mroz              Arden Mroz         
Carlton Kingsley Mcelveen    Carlton Mcelveen           
Dolly Verhey Irish           Dolly Irish            
Gaynell Pasquale Ayala       Gaynell Ayala          
Misti Thibodeaux             Misti Trey Thibodeaux          
Faye Hemby                   Faye Barrett Hemby         
Arden Mroz                   Arden Justin Mroz          
Carlton Mcelveen             Carlton Tameka Mcelveen            
Dolly Irish                  Dolly Jeremiah Irish           
Gaynell Ayala                Gaynell Cherry Ayala           

理想的清单应该是这样的:

Name_before                 Name_after                  Identifier
Misti Gulick                Misti Gulick Thibodeaux     Misti Gulick 
Faye Leaton                 Faye Leaton Hemby           Faye Leaton 
Arden Peck                  Arden Peck Mroz             Arden Peck 
Carlton Kingsley            Carlton Kingsley Mcelveen   Carlton Kingsley 
Dolly Verhey                Dolly Verhey Irish          Dolly Verhey 
Gaynell Pasquale            Gaynell Pasquale Ayala      Gaynell Pasquale 
Misti Gulick Thibodeaux     Misti Thibodeaux            Misti Gulick 
Faye Leaton Hemby           Faye Hemby                  Faye Leaton 
Arden Peck Mroz             Arden Mroz                  Arden Peck 
Carlton Kingsley Mcelveen   Carlton Mcelveen            Carlton Kingsley 
Dolly Verhey Irish          Dolly Irish                 Dolly Verhey 
Gaynell Pasquale Ayala      Gaynell Ayala               Gaynell Pasquale 
Misti Thibodeaux            Misti Trey Thibodeaux       Misti Gulick 
Faye Hemby                  Faye Barrett Hemby          Faye Leaton 
Arden Mroz                  Arden Justin Mroz           Arden Peck 
Carlton Mcelveen            Carlton Tameka Mcelveen     Carlton Kingsley 
Dolly Irish                 Dolly Jeremiah Irish        Dolly Verhey 
Gaynell Ayala               Gaynell Cherry Ayala        Gaynell Pasquale 

我试图做的是在 Name_before 中遇到来自 Name_after 的共同值,并重复执行,直到我没有更多匹配项。
每次创建这些表之一时,名称的数量都会减少。

create table name_temp1 as 
   select * 
      from Names 
          where Name_after in (select distinct(Name_before) from Names)
           order by Name_before, Name_after;                    

create table name_temp2 as 
  select * 
     from name_temp1 
       where Name_after in (select distinct(Name_before) from name_temp1) 
           order by Name_before, Name_after;            


create table name_temp3 as 
   select * 
      from name_temp2 
         where Name_after in (select distinct(Name_before) from name_temp2) 
           order by Name_before, Name_after;

然后我会使用带有“case”函数的查询:

select *,case when n3.Name_before=n2.Name_after 
    then case when n2.Name_before=n1.Name_after 
       then n1.Name_after else n.after end end end 
            from Names n,  name_temp1 n1, name_temp2 n2, name_temp3 n3;

我知道这根本不优雅,也没有性能。你们中的一些人会帮助我改进它吗?或者甚至欢迎其他建议!谢谢,

4

2 回答 2

1

架构

整个过程的目标应该是一个规范化的模式:一个person包含代理主键 person_id的表(因为没有明显的自然主键)。我建议你serial为此使用一个专栏。
还有一个person_name带有外键的表person

CREATE TEMP TABLE person(
   person_id serial PRIMARY KEY  -- implicit primary key constraint
   -- probably more attributes belonging to the person
 );

CREATE TEMP TABLE person_name(
   person_name_id  serial PRIMARY KEY
  ,person_id       int NOT NULL REFERENCES person(person_id) -- foreign key
  ,name            text NOT NULL
  ,step            int DEFAULT 0
   -- possibly more attributes that belong to the person at this step only
 );

(person_id, name)不可能UNIQUE,因为同一个人可以在一生中多次使用相同的名字。

为了提取数据,我想您使用带有递归 CTE的单个查询。但是,如果任何人曾经使用相同的名字,您的操作肯定会模棱两可。您可能会得到无意义的结果或循环依赖,如果没有额外的信息就无法解决。

person_namewith中的行将step = 0容纳您的"Identifier".

询问

为了这个查询,我假设 UNIQUE 名称(或者它不能工作。)。

WITH RECURSIVE p_start AS (
   SELECT row_number() OVER (ORDER BY n.name_before) AS person_id, n.*
   FROM   names n
   LEFT   JOIN names n2 ON n2.name_after = n.name_before
   WHERE  n2.name_after IS NULL
   )
, pers AS (
   SELECT person_id, name_after AS name, 1 AS step
   FROM   p_start

   UNION  ALL
   SELECT p.person_id, n.name_after, p.step + 1
   FROM   pers p
   JOIN   names  n ON n.name_before = p.name
   -- WHERE  p.step < 10 -- If query doesn't finish, stop the infinite recursion
   )
SELECT person_id, name_before AS name, 0 AS step
FROM   p_start
UNION ALL
SELECT person_id, name, step
FROM   pers
ORDER  BY person_id, step

-> SQLfiddle 演示。

一站式商店

有了上面的模式,你可以用一个查询来做所有事情:填充新表并返回结果:

WITH RECURSIVE p_start AS (
   SELECT row_number() OVER (ORDER BY n.name_before) AS person_id, n.*
   FROM   names n
   LEFT   JOIN names n2 ON n2.name_after = n.name_before
   WHERE  n2.name_after IS NULL
   )
, pers AS (
   SELECT person_id, name_after AS name, 1 AS step
   FROM   p_start

   UNION  ALL
   SELECT p.person_id, n.name_after, p.step + 1
   FROM   pers p
   JOIN   names  n ON n.name_before = p.name
   -- WHERE  p.step < 10 -- If query doesn't finish, stop the infinite recursion
   )
, ins_person AS (
   INSERT INTO person(person_id)
   SELECT person_id FROM p_start
   )
INSERT INTO person_name(person_id, name, step)
SELECT person_id, name_before, 0 AS step
FROM   p_start
UNION ALL
SELECT person_id, name, step
FROM   pers
ORDER  BY person_id, step
RETURNING *

-> SQLfiddle 演示。

最后,为 初始化序列person,这样以后就不会出现重复的键违规:

SELECT setval('person_person_id_seq', (SELECT max(person_id) FROM person))
于 2013-09-27T17:22:53.747 回答
1
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;

  -- make some data
CREATE TABLE names_org
        ( name_id SERIAL NOT NULL PRIMARY KEY
        , name_org varchar
        , name_new varchar
        );
COPY names_org (name_org,name_new) FROM stdin;
Misti Gulick    Misti Gulick Thibodeaux
Faye Leaton     Faye Leaton Hemby
Arden Peck      Arden Peck Mroz
Carlton Kingsley        Carlton Kingsley Mcelveen
Dolly Verhey    Dolly Verhey Irish
Gaynell Pasquale        Gaynell Pasquale Ayala
Misti Gulick Thibodeaux Misti Thibodeaux
Faye Leaton Hemby       Faye Hemby
Arden Peck Mroz Arden Mroz
Carlton Kingsley Mcelveen       Carlton Mcelveen
Dolly Verhey Irish      Dolly Irish
Gaynell Pasquale Ayala  Gaynell Ayala
Misti Thibodeaux        Misti Trey Thibodeaux
Faye Hemby      Faye Barrett Hemby
Arden Mroz      Arden Justin Mroz
Carlton Mcelveen        Carlton Tameka Mcelveen
Dolly Irish     Dolly Jeremiah Irish
Gaynell Ayala   Gaynell Cherry Ayala
\.

SELECT * FROM names_org;

以及更改和更新(为了清楚起见,分步进行)

  --Add a few self-referencing fields
  --
ALTER TABLE names_org
        -- points to the **first** entry for this person
        ADD COLUMN canon_id INTEGER
                REFERENCES names_org (name_id)
        -- points to the **nearest previous** entry for this person
        , ADD COLUMN parent_id INTEGER
                REFERENCES names_org (name_id)
        ;

        -- Update from **the nearest** previous record; if any
UPDATE names_org dst
SET parent_id = src.name_id
FROM names_org src
   -- src is the previous row for this person
WHERE src.name_new = dst.name_org
AND src.name_id < dst.name_id
   -- The nearest: eliminate the middlemen
AND NOT EXISTS (SELECT *
        FROM names_org nx
        WHERE nx.name_new = dst.name_org
        AND nx.name_id < dst.name_id
        AND nx.name_id > src.name_id
        );

   -- Add the final newnames (at the end of the chains) to the table, too.
   -- These are the name strings that only occur in name_new,
   -- but never in name_org
INSERT INTO names_org (name_org, parent_id)
SELECT name_new, name_id
FROM names_org src
WHERE NOT EXISTS (
        SELECT *
        FROM names_org nx
        WHERE nx.parent_id = src.name_id
        );

        -- Find canonical parent (the head of the chain)
WITH RECURSIVE list AS (
        SELECT name_id AS canon_id
        , name_id AS this_id
        FROM names_org
        WHERE parent_id IS NULL
        UNION ALL
        SELECT list.canon_id AS canon_id
                , this.name_id AS this_id
        FROM list
        JOIN names_org this ON this.parent_id = list.this_id
        )
UPDATE names_org this
SET canon_id = list.canon_id
FROM list
WHERE  this.name_id = list.this_id
        ;

   -- Now we can drop the new name and rename the org name
ALTER TABLE names_org DROP COLUMN  name_new ;
ALTER TABLE names_org RENAME COLUMN  name_org TO current_name ;

SELECT * FROM names_org;

结果:

ALTER TABLE
UPDATE 12
INSERT 0 6
UPDATE 24
ALTER TABLE
ALTER TABLE
 name_id |       current_name        | canon_id | parent_id 
---------+---------------------------+----------+-----------
       1 | Misti Gulick              |        1 |          
       2 | Faye Leaton               |        2 |          
       3 | Arden Peck                |        3 |          
       4 | Carlton Kingsley          |        4 |          
       5 | Dolly Verhey              |        5 |          
       6 | Gaynell Pasquale          |        6 |          
       7 | Misti Gulick Thibodeaux   |        1 |         1
       8 | Faye Leaton Hemby         |        2 |         2
       9 | Arden Peck Mroz           |        3 |         3
      10 | Carlton Kingsley Mcelveen |        4 |         4
      11 | Dolly Verhey Irish        |        5 |         5
      12 | Gaynell Pasquale Ayala    |        6 |         6
      13 | Misti Thibodeaux          |        1 |         7
      14 | Faye Hemby                |        2 |         8
      15 | Arden Mroz                |        3 |         9
      16 | Carlton Mcelveen          |        4 |        10
      17 | Dolly Irish               |        5 |        11
      18 | Gaynell Ayala             |        6 |        12
      19 | Misti Trey Thibodeaux     |        1 |        13
      20 | Faye Barrett Hemby        |        2 |        14
      21 | Arden Justin Mroz         |        3 |        15
      22 | Carlton Tameka Mcelveen   |        4 |        16
      23 | Dolly Jeremiah Irish      |        5 |        17
      24 | Gaynell Cherry Ayala      |        6 |        18
(24 rows)

注意:这种尴尬的结构将规范名称/编号(链表的开头)和更新链(后向链表)统一在一个表中。

可能更新步骤可以组合在一个语句中,但我不在乎。而且,正如 Erwin 评论的那样,这个过程对拼写错误、错误命中、不匹配和丢失记录非常敏感。特别是,字符集故障可能非常痛苦。

在大多数情况下,流程中的某处将需要一些手动步骤。

而且,为了使事情变得完整:模拟所需表格的视图:

CREATE VIEW triple_view AS
SELECT
        COALESCE(prev.current_name ,this.current_name) AS name_before
        , this.current_name AS name_after
        ,  abs.current_name AS identifier
FROM names_org this
JOIN names_org prev ON prev.name_id = this.parent_id
JOIN names_org abs ON abs.name_id = this.canon_id
        ;
SELECT * FROM triple_view;

此视图的结果:

        name_before        |        name_after         |    identifier    
---------------------------+---------------------------+------------------
 Misti Gulick              | Misti Gulick Thibodeaux   | Misti Gulick
 Faye Leaton               | Faye Leaton Hemby         | Faye Leaton
 Arden Peck                | Arden Peck Mroz           | Arden Peck
 Carlton Kingsley          | Carlton Kingsley Mcelveen | Carlton Kingsley
 Dolly Verhey              | Dolly Verhey Irish        | Dolly Verhey
 Gaynell Pasquale          | Gaynell Pasquale Ayala    | Gaynell Pasquale
 Misti Gulick Thibodeaux   | Misti Thibodeaux          | Misti Gulick
 Faye Leaton Hemby         | Faye Hemby                | Faye Leaton
 Arden Peck Mroz           | Arden Mroz                | Arden Peck
 Carlton Kingsley Mcelveen | Carlton Mcelveen          | Carlton Kingsley
 Dolly Verhey Irish        | Dolly Irish               | Dolly Verhey
 Gaynell Pasquale Ayala    | Gaynell Ayala             | Gaynell Pasquale
 Misti Thibodeaux          | Misti Trey Thibodeaux     | Misti Gulick
 Faye Hemby                | Faye Barrett Hemby        | Faye Leaton
 Arden Mroz                | Arden Justin Mroz         | Arden Peck
 Carlton Mcelveen          | Carlton Tameka Mcelveen   | Carlton Kingsley
 Dolly Irish               | Dolly Jeremiah Irish      | Dolly Verhey
 Gaynell Ayala             | Gaynell Cherry Ayala      | Gaynell Pasquale
(18 rows)
于 2013-09-27T18:31:06.277 回答