0

因此,假设我有无法控制的 csv 文件:

a.csv
b.csv
c.csv

它们都有不同的标题名称。我将所有数据转储到表 a、b 和 c 中。现在,如果我得到另一个带有新值的 a.csv(相同的标题字段),我怎样才能只插入不在旧值中的新 a 的值?

例如: a 表具有标题名称和年龄:

'Bob'   25
'Mary'  50

我得到一个新的 a.csv 解析为:

'Bob'   25
'Susie' 60

如何仅添加当前表唯一的行(例如,仅将 Susie 而不是 Bob 添加到表中)?我没有每个人的特定唯一 ID,所以我不能使用主键。还有多个标题字段,所以如果我尝试使用所有标题字段作为主键,它会返回“指定的键太长”。

我需要检查整行是否唯一,如果是,请将其添加到表中。我尝试了 INSERT IGNORE,但由于缺少唯一键,我无法使其正常工作。有什么建议么?如果有帮助,我会发布任何其他信息。

当前尝试:

cursor.execute("ALTER TABLE temp ADD PRIMARY KEY" + uniqueline)
cursor.execute("INSERT IGNORE INTO " + tablename + " SELECT * FROM temp")

其中 tablename 是表的名称,temp 是发送 csv 代码的位置,uniqueline 是当前表单中的前 5 个字段(field1、field2、field 3、field4、field5)。如果少于 5 个字段,则为所有字段。

谢谢!

编辑:

cursor.execute("INSERT INTO " + tablename + " SELECT * FROM temp where " + uniqueline + " NOT IN (SELECT * FROM " + tablename + ")")

It works once (with empty tables), but if I run it again to test it essentially freezes, and doesn't ever finish. Now I have these "phantom tables" that if I try to drop it says "unknown table" but if I try to create it it says "table already exists". I also can't add or delete anything from the table without it freezing. I'm going to try giving it a unique index again. Thanks for all your guys' help though!

4

2 回答 2

0

You may want to update your table to have unique index as:

       ALTER IGNORE TABLE MyTable ADD UNIQUE INDEX idx_name (name, age);

Once done, it should auto filter duplicate rows upon insertion. You may need to handle the exceptions.

Work around approach could be to drop the index before loading all the CSV files. Once the data is uploaded, re-apply the index to drop the duplicate records.

于 2013-06-17T16:13:46.353 回答
0

How about:

insert into MyTable select * from temp where (tempcolumn1, tempcolumn2, ..., tempcolumnn) not in (select * from MyTable)
于 2013-06-17T16:22:29.963 回答