0

My use case is the following: I have data coming from a csv file and I need to load it into a table (so far so good, nothing new here). It might happen that same data is sent with updated columns, in which case I would like to try to insert and replace in case of duplicate.

So my table is as follows:

CREATE TABLE codes (
  code            TEXT NOT NULL,
  position_x      INT,
  position_y      INT
  PRIMARY KEY (code)
);

And incoming csv file is like this:

TEST01,1,1
TEST02,1,2
TEST0131,3
TEST04,1,4

It might happen that sometime in the future I get another csv file with:

TEST01,1,1000 <<<<< updated value
TEST05,1,5
TEST0631,6
TEST07,1,7

Right now what is happening is when I run for the first file, everything is fine, but when I execute for the second one I'm getting an error:

2017-04-26T10:33:51.306000+01:00 ERROR Database error 23505: duplicate key value violates unique constraint "codes_pkey"
DETAIL: Key (code)=(TEST01) already exists.

I load data using:

pgloader csv.load

And my csv.load file looks like this:

LOAD CSV
     FROM 'codes.csv' (code, position_x, position_y)
     INTO postgresql://localhost:5432/codes?tablename=codes (code, position_x, position_y)

     WITH fields optionally enclosed by '"',
          fields terminated by ',';

Is what I'm trying to do possible with pgloader?

I also tried dropping constrains for the primary key but then I end up with duplicate entries in the table.

Thanks a lot for your help.

4

1 回答 1

1

不,你不能。根据参考

为了解决这个问题(加载异常,例如 PK 违规),pgloader 将数据分成每批 25000 行,这样当出现问题时,它只会影响那么多行数据。

在括号中 - 我的...

EXCEPT您可以做的最好的事情是将 csv 加载到具有相同结构的表中,然后在查询(等OUTER JOIN ... where null)的帮助下合并数据

于 2017-04-26T11:41:28.897 回答