0

我正在尝试使用 sqoop 将一些数据从 hdfs 导出到 mysql。问题是当我尝试导出它正确导出的未压缩文件时,但如果我尝试导出使用 lzo 压缩压缩的相同文件,则 sqoop 作业将失败。我正在标准 cloudera CDH4 VM 环境中尝试它。文件中的列由制表符分隔,null 表示为 '\N'。

文件内容:

[cloudera@localhost ~]$ cat dipayan-test.txt
dipayan koramangala 29
raju    marathahalli    32
raju    marathahalli    32
raju    \N  32
raju    marathahalli    32
raju    \N  32
raju    marathahalli    32
raju    marathahalli    \N
raju    marathahalli    \N

mysql表说明:

mysql> describe sqooptest;
+---------+--------------+------+-----+---------+-------+
| Field   | Type         | Null | Key | Default | Extra |
+---------+--------------+------+-----+---------+-------+
| name    | varchar(100) | YES  |     | NULL    |       |
| address | varchar(100) | YES  |     | NULL    |       |
| age     | int(11)      | YES  |     | NULL    |       |
+---------+--------------+------+-----+---------+-------+
3 rows in set (0.01 sec)

hdfs中的文件:

[cloudera@localhost ~]$ hadoop fs -ls /user/cloudera/dipayan-test
Found 1 items
-rw-r--r--   3 cloudera cloudera        138 2014-02-16 23:18 /user/cloudera/dipayan-test/dipayan-test.txt.lzo

sqoop 命令:

sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "XXXXXX" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'

错误:

[cloudera@localhost ~]$ sqoop export --connect "jdbc:mysql://localhost/bigdata" --username "root" --password "mysql" --driver "com.mysql.jdbc.Driver" --table sqooptest --export-dir /user/cloudera/dipayan-test/ --input-fields-terminated-by '\t' -m 1 --input-null-string '\\N' --input-null-non-string '\\N'
14/02/16 23:19:26 WARN tool.BaseSqoopTool: Setting your password on the command-line is insecure. Consider using -P instead.
14/02/16 23:19:26 WARN sqoop.ConnFactory: Parameter --driver is set to an explicit driver however appropriate connection manager is not being set (via --connection-manager). Sqoop is going to fall back to org.apache.sqoop.manager.GenericJdbcManager. Please specify explicitly which connection manager should be used next time.
14/02/16 23:19:26 INFO manager.SqlManager: Using default fetchSize of 1000
14/02/16 23:19:26 INFO tool.CodeGenTool: Beginning code generation
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:27 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
14/02/16 23:19:27 INFO orm.CompilationManager: Found hadoop core jar at: /usr/lib/hadoop-0.20-mapreduce/hadoop-core.jar
Note: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
14/02/16 23:19:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-cloudera/compile/676bc185f1efffa3b0de0a924df4a02d/sqooptest.jar
14/02/16 23:19:29 INFO mapreduce.ExportJobBase: Beginning export of sqooptest
14/02/16 23:19:30 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM sqooptest AS t WHERE 1=0
14/02/16 23:19:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1
14/02/16 23:19:31 INFO input.FileInputFormat: Total input paths to process : 1
14/02/16 23:19:31 INFO mapred.JobClient: Running job: job_201402162201_0013
14/02/16 23:19:32 INFO mapred.JobClient:  map 0% reduce 0%
14/02/16 23:19:41 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_0, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
    at java.util.AbstractList$Itr.next(AbstractList.java:350)
    at sqooptest.__loadFromFields(sqooptest.java:225)
    at sqooptest.parse(sqooptest.java:174)
    at org.apach
14/02/16 23:19:48 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_1, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
    at java.util.AbstractList$Itr.next(AbstractList.java:350)
    at sqooptest.__loadFromFields(sqooptest.java:225)
    at sqooptest.parse(sqooptest.java:174)
    at org.apach
14/02/16 23:19:55 INFO mapred.JobClient: Task Id : attempt_201402162201_0013_m_000000_2, Status : FAILED
java.io.IOException: Can't export data, please check task tracker logs
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:112)
    at org.apache.sqoop.mapreduce.TextExportMapper.map(TextExportMapper.java:39)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:140)
    at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:672)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:330)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:396)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1408)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
Caused by: java.util.NoSuchElementException
    at java.util.AbstractList$Itr.next(AbstractList.java:350)
    at sqooptest.__loadFromFields(sqooptest.java:225)
    at sqooptest.parse(sqooptest.java:174)
    at org.apach
14/02/16 23:20:04 INFO mapred.JobClient: Job complete: job_201402162201_0013
14/02/16 23:20:04 INFO mapred.JobClient: Counters: 7
14/02/16 23:20:04 INFO mapred.JobClient:   Job Counters 
14/02/16 23:20:04 INFO mapred.JobClient:     Failed map tasks=1
14/02/16 23:20:04 INFO mapred.JobClient:     Launched map tasks=4
14/02/16 23:20:04 INFO mapred.JobClient:     Data-local map tasks=4
14/02/16 23:20:04 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=29679
14/02/16 23:20:04 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
14/02/16 23:20:04 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
14/02/16 23:20:04 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
14/02/16 23:20:04 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 33.5335 seconds (0 bytes/sec)
14/02/16 23:20:04 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead
14/02/16 23:20:04 INFO mapreduce.ExportJobBase: Exported 0 records.
14/02/16 23:20:04 ERROR tool.ExportTool: Error during export: Export job failed!

如果文件未压缩并且我直接使用dipayan-test.txt文件,这将非常有效。

在解决此问题时需要帮助,并且还想知道在处理 lzo 文件时是否遗漏了某些内容。

4

2 回答 2

2

导出失败可能有多种原因:

* Loss of connectivity from the Hadoop cluster to the database (either due to hardware fault, or server software crashes)
* Attempting to INSERT a row which violates a consistency constraint (for example, inserting a duplicate primary key value)
* Attempting to parse an incomplete or malformed record from the HDFS source data
* Attempting to parse records using incorrect delimiters
* Capacity issues (such as insufficient RAM or disk space) 

取自这里

就我而言,我得到了相同的NoSuchElementException并设置了正确的字段终止符以--fields-terminated-by '\t'解决问题。

当没有提到时,Sqoop 将 mysql 的默认终止符视为:“,”作为字段终止符,“\n”作为行终止符。

于 2014-10-04T09:16:58.133 回答
0

您的表可能没有正确的列。您可以随时进入 sqoop 为您创建的 .java 文件并从那里进行调试:sqooptest.java:225

于 2014-02-19T20:08:41.207 回答