我在导入 HBase 时遇到问题,使用带有 sqoop 的大型数据集,大约 500 万条记录。mapreduce 作业开始但大约 30% 后停止。然后返回以下错误信息。
我环顾四周,找到了这个链接,并通过添加来调整我的命令import -D, mapred.task.timeout=0
,-m
只是试一试,但最终结果是一样的,尽管它现在仍然停留在 90%。
sqoop 导入命令是这样的。我是否遗漏了任何参数,还是需要添加到 hbase-site 或 zoo.cfg 配置文件中?
> ./sqoop import --connect import -D mapred.task.timeout=0 'jdbc:sqlserver://192.168.4.1:1433;database=dbname;user=sa;password=password' --table user --hbase-table newtable --column-family cf1 --hbase-row-key id --hbase-create-table --split-by id -m 14
13/10/24 15:06:29 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
13/10/24 15:06:29 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 3388@cloudera
13/10/24 15:06:29 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x141e977a64e0004, negotiated timeout = 40000
13/10/24 15:06:29 INFO zookeeper.ClientCnxn: EventThread shut down
13/10/24 15:06:29 INFO zookeeper.ZooKeeper: Session: 0x141e977a64e0004 closed
13/10/24 15:06:29 INFO mapreduce.HBaseImportJob: Creating missing HBase table ai
13/10/24 15:06:30 INFO zookeeper.ZooKeeper: Initiating client connection, connectString=localhost:2181 sessionTimeout=180000 watcher=catalogtracker-on-org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@11d1284a
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Socket connection established to localhost/127.0.0.1:2181, initiating session
13/10/24 15:06:30 INFO zookeeper.RecoverableZooKeeper: The identifier of this process is 3388@cloudera
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x141e977a64e0005, negotiated timeout = 40000
13/10/24 15:06:30 INFO zookeeper.ZooKeeper: Session: 0x141e977a64e0005 closed
13/10/24 15:06:30 INFO zookeeper.ClientCnxn: EventThread shut down
13/10/24 15:06:31 INFO db.DataDrivenDBInputFormat: BoundingValsQuery: SELECT MIN([AIIDX]), MAX([AIIDX]) FROM [ai_view]
13/10/24 15:06:32 INFO mapred.JobClient: Running job: job_201310241455_0001
13/10/24 15:06:33 INFO mapred.JobClient: map 0% reduce 0%
13/10/24 15:08:24 INFO mapred.JobClient: map 7% reduce 0%
13/10/24 15:08:50 INFO mapred.JobClient: map 14% reduce 0%
13/10/24 15:10:11 INFO mapred.JobClient: map 21% reduce 0%
13/10/24 15:10:51 INFO mapred.JobClient: map 28% reduce 0%
13/10/24 15:12:16 INFO mapred.JobClient: map 35% reduce 0%
13/10/24 15:12:57 INFO mapred.JobClient: map 42% reduce 0%
13/10/24 15:14:12 INFO mapred.JobClient: map 50% reduce 0%
13/10/24 15:14:55 INFO mapred.JobClient: map 57% reduce 0%
13/10/24 15:16:35 INFO mapred.JobClient: map 64% reduce 0%
13/10/24 15:17:28 INFO mapred.JobClient: map 71% reduce 0%
13/10/24 15:18:42 INFO mapred.JobClient: map 78% reduce 0%
13/10/24 15:19:24 INFO mapred.JobClient: map 85% reduce 0%
13/10/24 15:20:44 INFO mapred.JobClient: map 92% reduce 0%
13/10/24 16:28:28 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_0, Status : FAILED
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:708)
at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:511)
at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:481)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupConnection(HBaseClient.java:390)
at org.apache.hadoop.hbase.ipc.HBaseClient$Connection.setupIOstreams(HBaseClient.java:436)
at org.apache.hadoop.hbase.ipc.HBaseClient.getConnection(HBaseClient.java:1133)
at org.apache.hadoop.hbase.ipc.HBaseClient.call(HBaseClient.java:980)
at org.apache.hadoop.hbase.ipc.WritableRpcEngine$Invoker.invoke(WritableRpcEngine.java:86)
at com.sun.proxy.$Proxy7.getClosestRowBefore(Unknown Source)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1137)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.relocateRegion(HConnectionManager.java:975)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1214)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:1678)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatch(HConnectionManager.java:1563)
at org.apache.hadoop.hbase.client.HTable.flushCommits(HTable.java:990)
at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:846)
at org.apache.hadoop.hbase.client.HTable.put(HTable.java:822)
at org.apache.sqoop.hbase.HBasePutProcessor.accept(HBasePutProcessor.java:150)
at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.write(DelegatingOutputFormat.java:128)
at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.write(DelegatingOutputFormat.java:92)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.write(MapTask.java:639)
at org.apache.hadoop.mapreduce.TaskInputOutputContext.write(TaskInputOutputContext.java:80)
at org.apache.sqoop.mapreduce.HBaseImportMapper.map(HBaseImportMapper.java:38)
at org.apache.sqoop.mapreduce.HBaseImportMapper.map(HBaseImportMapper.java:31)
at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
13/10/24 16:33:12 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_1, Status : FAILED
java.lang.RuntimeException: Could not access HBase table ai
at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:122)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.<init>(DelegatingOutputFormat.java:107)
at org.apache.sqoop.mapreduce.DelegatingOutputFormat.getRecordWriter(DelegatingOutputFormat.java:82)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for ai,,99999999999999 after 14 tries.
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1095)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1102)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:155)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:129)
at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:120)
... 12 more
13/10/24 16:37:58 INFO mapred.JobClient: Task Id : attempt_201310241455_0001_m_000013_2, Status : FAILED
java.lang.RuntimeException: Could not access HBase table ai
at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:122)
at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:62)
at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:117)
at org.apache.sqoop.mapreduce.DelegatingOutputFormat$DelegatingRecordWriter.<init>(DelegatingOutputFormat.java:107)
at org.apache.sqoop.mapreduce.DelegatingOutputFormat.getRecordWriter(DelegatingOutputFormat.java:82)
at org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.<init>(MapTask.java:628)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:753)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
Caused by: org.apache.hadoop.hbase.client.NoServerForRegionException: Unable to find region for ai,,99999999999999 after 14 tries.
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1095)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1000)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:1102)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:1004)
at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:961)
at org.apache.hadoop.hbase.client.HTable.finishSetup(HTable.java:251)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:155)
at org.apache.hadoop.hbase.client.HTable.<init>(HTable.java:129)
at org.apache.sqoop.hbase.HBasePutProcessor.setConf(HBasePutProcessor.java:120)
... 12 more
13/10/24 16:42:44 INFO mapred.JobClient: Job complete: job_201310241455_0001
13/10/24 16:42:44 INFO mapred.JobClient: Counters: 18
13/10/24 16:42:44 INFO mapred.JobClient: Job Counters
13/10/24 16:42:44 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=6610795
13/10/24 16:42:44 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
13/10/24 16:42:44 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
13/10/24 16:42:44 INFO mapred.JobClient: Launched map tasks=17
13/10/24 16:42:44 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=0
13/10/24 16:42:44 INFO mapred.JobClient: Failed map tasks=1
13/10/24 16:42:44 INFO mapred.JobClient: File Output Format Counters
13/10/24 16:42:44 INFO mapred.JobClient: Bytes Written=0
13/10/24 16:42:44 INFO mapred.JobClient: FileSystemCounters
13/10/24 16:42:44 INFO mapred.JobClient: HDFS_BYTES_READ=1498
13/10/24 16:42:44 INFO mapred.JobClient: FILE_BYTES_WRITTEN=1089897
13/10/24 16:42:44 INFO mapred.JobClient: File Input Format Counters
13/10/24 16:42:44 INFO mapred.JobClient: Bytes Read=0
13/10/24 16:42:44 INFO mapred.JobClient: Map-Reduce Framework
13/10/24 16:42:44 INFO mapred.JobClient: Map input records=4782546
13/10/24 16:42:44 INFO mapred.JobClient: Physical memory (bytes) snapshot=2150453248
13/10/24 16:42:44 INFO mapred.JobClient: Spilled Records=0
13/10/24 16:42:44 INFO mapred.JobClient: CPU time spent (ms)=313010
13/10/24 16:42:44 INFO mapred.JobClient: Total committed heap usage (bytes)=1125842944
13/10/24 16:42:44 INFO mapred.JobClient: Virtual memory (bytes) snapshot=13256167424
13/10/24 16:42:44 INFO mapred.JobClient: Map output records=4782546
13/10/24 16:42:44 INFO mapred.JobClient: SPLIT_RAW_BYTES=1498
13/10/24 16:42:44 INFO mapreduce.ImportJobBase: Transferred 0 bytes in 5,773.4138 seconds (0 bytes/sec)
13/10/24 16:42:44 INFO mapreduce.ImportJobBase: Retrieved 4782546 records.
13/10/24 16:42:44 ERROR tool.ImportTool: Error during import: Import job failed!