hadoop - 使用更新密钥的 Sqoop 导出

Question

我必须将 HDFS 文件导出到 MySql 中。
假设我的 HDFS 文件是：

1,abcd,23
2,efgh,24
3,ijkl,25
4,mnop,26
5,qrst,27

并说我的 Mysql 数据库架构是：

+-----+-----+-------------+
| ID  | AGE |    NAME     |
+-----+-----+-------------+
|     |     |             |
+-----+-----+-------------+

当我使用以下 Sqoop 命令插入时：

sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age"

它工作正常并插入数据库。

但是，当我需要更新已经存在的记录时，我必须使用--update-key和--columns.

现在，当我尝试使用以下命令更新表时：

sqoop export \
--connect jdbc:mysql://localhost/DBNAME \
--username root \
--password root \
--export-dir /input/abc \
--table test \
--fields-terminated-by "," \
--columns "id,name,age" \
--update-key id

我面临的问题是，数据没有按照中指定的方式更新到列中--columns

我做错什么了吗？

我们不能这样更新数据库吗？HDFS 文件应该在 Mysql 模式中仅用于更新？

有没有其他方法可以实现这一目标？

score 9 · Accepted Answer

4b.将HDFS中的数据更新到关系数据库中的表中

在 mysql 测试数据库中创建 emp 表 tbl

create table emp
(
id int not null primary key,
name varchar(50)
);

vi emp --> 创建包含以下内容的文件

1,Thiru
2,Vikram
3,Brij
4,Sugesh

将文件移动到 hdfs

hadoop fs -put emp <dir>

执行以下 sqoop 作业，将数据导出到 mysql

sqoop export --connect <jdbc connection> \
--username sqoop \
--password sqoop \
--table emp \
--export-dir <dir> \
--input-fields-terminated-by ',';

验证mysql表中的数据

mysql> select * from emp;

+----+--------+
| id | name   |
+----+--------+
|  1 | Thiru  |
|  2 | Vikram |
|  3 | Brij   |
|  4 | Sugesh |
+----+--------+

更新 emp 文件并将更新后的文件移动到 hdfs 中。更新文件的内容

1,Thiru
2,Vikram
3,Sugesh
4,Brij
5,Sagar

用于 upsert 的 Sqoop 导出 - 如果键匹配 else 插入，则更新。

sqoop export --connect <jdbc connection> \
--username sqoop \
--password sqoop \
--table emp \
--update-mode allowinsert \
--update-key id \
--export-dir <dir> \
--input-fields-terminated-by ',';

Note: --update-mode <mode> - we can pass two arguments "updateonly" - to update the records. this will update the records if the update key matches.
if you want to do upsert (If exists UPDATE else INSERT) then use "allowinsert" mode.
example: 
--update-mode updateonly \ --> for updates
--update-mode allowinsert \ --> for upsert

验证结果：

mysql> select * from emp;
+----+--------+
| id | name   |
+----+--------+
|  1 | Thiru  |
|  2 | Vikram |
|  3 | Sugesh |--> Previous value "Brij"
|  4 | Brij   |--> Previous value "Sugesh"
|  5 | Sagar  |--> new value inserted
+----+--------+

score 3 · Accepted Answer

Just try with --update-key primary_key

 sqoop export --connect jdbc:mysql://localhost/DBNAME -username root -password root --export-dir /input/abc --table test --fields-terminated-by "," --update-key id

It worked for me.It updates all records matching with primary key. (It may not insert new data)

Make use of --update-mode updateonly/allowinsert wisely

score 1 · Accepted Answer

您可能想尝试使用 --input-fields-terminated-by。目前，您正在使用 fields-terminated-by，它用于导入。

score 0 · Accepted Answer

我实际上以多种方式在 Sqoop 上尝试过这个。Update-Key 只能更新表中已经存在的列并且不能插入它们，除非您还提到 Update-Mode 以允许插入（并非所有数据库都支持）。如果您实际上尝试使用 update-key 进行更新，它将更新 update-key 中提到的键的行。

hadoop - 使用更新密钥的 Sqoop 导出

4 回答 4

Related

Reference