1

我是 Cloudera 环境的新手,我正在尝试使用Sqoop从 RDBMS 导入数据,我需要在导入期间对数据应用一些转换。具体来说,我需要先加密一些字段,然后再将它们存储在 Hadoop DFS 上。为此,我尝试使用codegen命令,该命令会生成一个我可以修改的 ORM java 类。

假设我在 MySQL 数据库上有一个表“产品”,我想使用 Sqoop 在 HDFS 上导入它并加密“品牌”字段。首先我运行了这个命令:

sqoop codegen \ 
--connect jdbc:mysql://localhost/test \
--username username --password password \
--table products

这会在文件夹 /tmp/sqoop-training/compile/fc8868dda33ef703ad126583cf77477f 中生成文件 products.java、products.jar 和 products.class。

现在我修改了 products.java 中的 readFields 方法,如下所示:

// WARNING: This class is AUTO-GENERATED. Modify at your own risk.
//
// Debug information:
// Generated date: Thu Nov 16 06:55:13 PST 2017
// For connector: org.apache.sqoop.manager.MySQLManager
import org.apache.hadoop.io.BytesWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapred.lib.db.DBWritable;
import com.cloudera.sqoop.lib.JdbcWritableBridge;
import com.cloudera.sqoop.lib.DelimiterSet;
import com.cloudera.sqoop.lib.FieldFormatter;
import com.cloudera.sqoop.lib.RecordParser;
import com.cloudera.sqoop.lib.BooleanParser;
import com.cloudera.sqoop.lib.BlobRef;
import com.cloudera.sqoop.lib.ClobRef;
import com.cloudera.sqoop.lib.LargeObjectLoader;
import com.cloudera.sqoop.lib.SqoopRecord;
import java.sql.PreparedStatement;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.sql.Date;
import java.sql.Time;
import java.sql.Timestamp;
import java.util.Arrays;
import java.util.Iterator;
import java.util.List;
import java.util.Map;
import java.util.TreeMap;

public class products extends SqoopRecord  implements DBWritable, Writable {

    // [...]

    public void readFields(ResultSet __dbResults) throws SQLException {
        this.__cur_result_set = __dbResults;
        this.prod_id = JdbcWritableBridge.readInteger(1, __dbResults);
        this.brand = encrypt(JdbcWritableBridge.readString(2, __dbResults));
        this.name = JdbcWritableBridge.readString(3, __dbResults);
        this.price = JdbcWritableBridge.readInteger(4, __dbResults);
        this.cost = JdbcWritableBridge.readInteger(5, __dbResults);
        this.shipping_wt = JdbcWritableBridge.readInteger(6, __dbResults);
    }

    // [...]

}

我有两个问题:
1)如何重新编译 products.java 以获得 products.class 和 products.jar 的更新版本?我试过了

javac products.java

但是 JVM 给出了 82 个错误,它似乎无法从 hadoop 和 cloudera 命名空间中找到包:

error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.BytesWritable;
                           ^
products.java:8: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Text;
                           ^
products.java:9: error: package org.apache.hadoop.io does not exist
import org.apache.hadoop.io.Writable;
                           ^
products.java:10: error: package org.apache.hadoop.mapred.lib.db does not exist
import org.apache.hadoop.mapred.lib.db.DBWritable;
                                      ^
products.java:11: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.JdbcWritableBridge;
                             ^
products.java:12: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.DelimiterSet;
                             ^
products.java:13: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.FieldFormatter;
                             ^
products.java:14: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.RecordParser;
                             ^
products.java:15: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.BooleanParser;
                             ^
products.java:16: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.BlobRef;
                             ^
products.java:17: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.ClobRef;
                             ^
products.java:18: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.LargeObjectLoader;
                             ^
products.java:19: error: package com.cloudera.sqoop.lib does not exist
import com.cloudera.sqoop.lib.SqoopRecord;


2) 成功编译 products.java 后,如何使用 Sqoop 使用我的自定义 ORM 类在 HDFS 上导入数据?



提前致谢!

4

1 回答 1

1

关于第一个问题:

添加

export CLASSPATH=`hadoop classpath`:/opt/cloudera/parcels/CDH/lib/sqoop/lib

然后再试一次。

附言。一般在架构上,对“特别是我需要在将某些字段存储在 Hadoop DFS 上之前对其进行加密”的小评论 - 你为什么不使用 HDFS 透明加密?https://www.cloudera.com/documentation/enterprise/latest/topics/cdh_sg_hdfs_encryption.html您无需任何编码即可实现相同目的。

于 2017-11-16T21:42:20.787 回答