3

在 CDH4 生态系统中,我试图让 mapreduce 作业输出到 hbase 表。由于某种原因,它在配置设置的 addDependencyJars 调用期间失败。

据我所知,hbase 配置没有选择 hadoop 配置(请参阅作业输出的警告)。我提供了 hdfs-site.xml、作业配置、带有堆栈跟踪的作业输出和文件权限。

任何有关如何进一步调试的帮助或见解将不胜感激。

hdfs-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <!-- replication configuration -->
    <property>
        <name>dfs.permissions.enabled</name>
        <value>false</value>
    </property>
    <property>
        <name>dfs.permissions.superusergroup</name>
        <value>hadoop</value>
    </property>
    <property>
        <name>dfs.namenode.name.dir</name>
        <value>/var/hadoop/namenode</value>
    </property>
    <property>
        <name>dfs.datanode.data.dir</name>
        <value>/var/hadoop/datanode</value>
    </property>
</configuration>

//作业配置

Configuration conf = HBaseConfiguration.create(); 
Job job = new Job(conf);
job.setJarByClass(LocalCsvCdrHbaseJob.class); 
job.setJobName("Local CVS CDR Venue Session Analysis to hbase"); 
job.setMapOutputKeyClass(IntWritable.class); 
job.setMapOutputValueClass(VenueSession.class); 
job.setMapperClass(VenueMapper.class); 
job.setReducerClass(VenueSessionCountHbaseReducer.class); 
job.setInputFormatClass(TextInputFormat.class); 
job.setOutputFormatClass(TableOutputFormat.class); 
FileInputFormat.setInputPaths(job, new Path(args[0])); 
TableMapReduceUtil.initTableReducerJob("venue_session", VenueSessionCountHbaseReducer.class, job); 
TableMapReduceUtil.addDependencyJars(job); 
job.waitForCompletion(true);

hbase 类路径肯定包含 hadoop conf 目录(etc/hadoop/conf)。

:~ # sudo -u mapred HADOOP_CLASSPATH=`hbase classpath` hadoop jar /home/mapred/cdr-hadoop-0.0.0-SNAPSHOT.jar net.thecloud.bi.cdr.jobs.LocalCsvCdrHbaseJob /cdr-venue-sessions/2013-05-22.cdr.csv
13/08/08 11:03:12 WARN conf.Configuration: dfs.df.interval is deprecated. Instead, use fs.df.interval
13/08/08 11:03:12 WARN conf.Configuration: dfs.max.objects is deprecated. Instead, use dfs.namenode.max.objects
13/08/08 11:03:12 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
13/08/08 11:03:12 WARN conf.Configuration: dfs.data.dir is deprecated. Instead, use dfs.datanode.data.dir
13/08/08 11:03:12 WARN conf.Configuration: dfs.name.dir is deprecated. Instead, use dfs.namenode.name.dir
13/08/08 11:03:12 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
13/08/08 11:03:12 WARN conf.Configuration: fs.checkpoint.dir is deprecated. Instead, use dfs.namenode.checkpoint.dir
13/08/08 11:03:12 WARN conf.Configuration: dfs.block.size is deprecated. Instead, use dfs.blocksize
13/08/08 11:03:12 WARN conf.Configuration: dfs.access.time.precision is deprecated. Instead, use dfs.namenode.accesstime.precision
13/08/08 11:03:12 WARN conf.Configuration: dfs.replication.min is deprecated. Instead, use dfs.namenode.replication.min
13/08/08 11:03:12 WARN conf.Configuration: dfs.name.edits.dir is deprecated. Instead, use dfs.namenode.edits.dir
13/08/08 11:03:12 WARN conf.Configuration: dfs.replication.considerLoad is deprecated. Instead, use dfs.namenode.replication.considerLoad
13/08/08 11:03:12 WARN conf.Configuration: dfs.balance.bandwidthPerSec is deprecated. Instead, use dfs.datanode.balance.bandwidthPerSec
13/08/08 11:03:12 WARN conf.Configuration: dfs.safemode.threshold.pct is deprecated. Instead, use dfs.namenode.safemode.threshold-pct
13/08/08 11:03:12 WARN conf.Configuration: dfs.http.address is deprecated. Instead, use dfs.namenode.http-address
13/08/08 11:03:12 WARN conf.Configuration: dfs.name.dir.restore is deprecated. Instead, use dfs.namenode.name.dir.restore
13/08/08 11:03:12 WARN conf.Configuration: dfs.https.client.keystore.resource is deprecated. Instead, use dfs.client.https.keystore.resource
13/08/08 11:03:12 WARN conf.Configuration: dfs.backup.address is deprecated. Instead, use dfs.namenode.backup.address
13/08/08 11:03:12 WARN conf.Configuration: dfs.backup.http.address is deprecated. Instead, use dfs.namenode.backup.http-address
13/08/08 11:03:12 WARN conf.Configuration: dfs.permissions is deprecated. Instead, use dfs.permissions.enabled
13/08/08 11:03:12 WARN conf.Configuration: dfs.safemode.extension is deprecated. Instead, use dfs.namenode.safemode.extension
13/08/08 11:03:12 WARN conf.Configuration: dfs.datanode.max.xcievers is deprecated. Instead, use dfs.datanode.max.transfer.threads
13/08/08 11:03:12 WARN conf.Configuration: dfs.https.need.client.auth is deprecated. Instead, use dfs.client.https.need-auth
13/08/08 11:03:12 WARN conf.Configuration: dfs.https.address is deprecated. Instead, use dfs.namenode.https-address
13/08/08 11:03:12 WARN conf.Configuration: dfs.replication.interval is deprecated. Instead, use dfs.namenode.replication.interval
13/08/08 11:03:12 WARN conf.Configuration: fs.checkpoint.edits.dir is deprecated. Instead, use dfs.namenode.checkpoint.edits.dir
13/08/08 11:03:12 WARN conf.Configuration: dfs.write.packet.size is deprecated. Instead, use dfs.client-write-packet-size
13/08/08 11:03:12 WARN conf.Configuration: dfs.permissions.supergroup is deprecated. Instead, use dfs.permissions.superusergroup
13/08/08 11:03:12 WARN conf.Configuration: topology.script.number.args is deprecated. Instead, use net.topology.script.number.args
13/08/08 11:03:12 WARN conf.Configuration: dfs.umaskmode is deprecated. Instead, use fs.permissions.umask-mode
13/08/08 11:03:12 WARN conf.Configuration: dfs.secondary.http.address is deprecated. Instead, use dfs.namenode.secondary.http-address
13/08/08 11:03:12 WARN conf.Configuration: fs.checkpoint.period is deprecated. Instead, use dfs.namenode.checkpoint.period
13/08/08 11:03:12 WARN conf.Configuration: topology.node.switch.mapping.impl is deprecated. Instead, use net.topology.node.switch.mapping.impl
13/08/08 11:03:12 WARN conf.Configuration: io.bytes.per.checksum is deprecated. Instead, use dfs.bytes-per-checksum
Exception in thread "main" java.io.IOException: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreateJar(TableMapReduceUtil.java:598)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:549)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.addDependencyJars(TableMapReduceUtil.java:513)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableReducerJob(TableMapReduceUtil.java:456)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableReducerJob(TableMapReduceUtil.java:393)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableReducerJob(TableMapReduceUtil.java:363)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.initTableReducerJob(TableMapReduceUtil.java:346)
    at net.thecloud.bi.cdr.jobs.LocalCsvCdrHbaseJob.main(LocalCsvCdrHbaseJob.java:46)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.RuntimeException: java.io.IOException: Permission denied
    at org.apache.hadoop.util.JarFinder.getJar(JarFinder.java:164)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at org.apache.hadoop.hbase.mapreduce.TableMapReduceUtil.findOrCreateJar(TableMapReduceUtil.java:595)
    ... 12 more
Caused by: java.io.IOException: Permission denied
    at java.io.UnixFileSystem.createFileExclusively(Native Method)
    at java.io.File.checkAndCreate(File.java:1704)
    at java.io.File.createTempFile(File.java:1792)
    at org.apache.hadoop.util.JarFinder.getJar(JarFinder.java:156)
    ... 17 more

文件权限

:~ # ls -l /var/hadoop/
total 12
drwxrwxrwx 2 hdfs   hdfs   4096 Aug  8 09:23 datanode
drwxrwxrwx 3 mapred hadoop 4096 Aug  8 09:41 mapred
drwxrwxrwx 3 hdfs   hdfs   4096 Aug  8 09:59 namenode

hdfs 权限

:~ # hdfs dfs -ls -R /
drwxrwxrwx   - hdfs  hadoop          0 2013-08-08 09:36 /cdr-venue-sessions
-rw-rw-rw-   3 hdfs  hadoop   27014304 2013-08-08 09:36 /cdr-venue-sessions/2013-05-22.cdr.csv
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:07 /hbase/.logs
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:06 /hbase/.oldlogs
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/.tmp
-rw-rw-rw-   3 hbase hadoop         38 2013-08-08 10:06 /hbase/hbase.id
-rw-rw-rw-   3 hbase hadoop          3 2013-08-08 10:06 /hbase/hbase.version
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/venue_session
-rw-rw-rw-   3 hbase hadoop        711 2013-08-08 10:10 /hbase/venue_session/.tableinfo.0000000001
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/venue_session/.tmp
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/venue_session/5cd64eee2dea6b1464023f24eee3daf0
-rw-rw-rw-   3 hbase hadoop        246 2013-08-08 10:10 /hbase/venue_session/5cd64eee2dea6b1464023f24eee3daf0/.regioninfo
drwxrwxrwx   - hbase hadoop          0 2013-08-08 10:10 /hbase/venue_session/5cd64eee2dea6b1464023f24eee3daf0/values
drwxrwxrwt   - hdfs  hadoop          0 2013-08-08 09:41 /tmp
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:41 /tmp/hadoop-mapred
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:41 /tmp/hadoop-mapred/mapred
drwxrwxrwx   - mapred hadoop          0 2013-08-08 10:06 /tmp/hadoop-mapred/mapred/system
-rw-rw-rw-   3 mapred hadoop          4 2013-08-08 10:06 /tmp/hadoop-mapred/mapred/system/jobtracker.info
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:30 /user-venue-types
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:28 /var
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:28 /var/hadoop
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:28 /var/hadoop/mapred
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:27 /var/lib
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs/cache
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs/cache/mapred
drwxrwxrwx   - mapred hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs/cache/mapred/mapred
drwxrwxrwt   - mapred hadoop          0 2013-08-08 09:27 /var/lib/hadoop-hdfs/cache/mapred/mapred/staging
drwxrwxrwx   - hdfs   hadoop          0 2013-08-08 09:30 /venues
4

1 回答 1

1

权限通常在 Hadoop 中并不容易。几个调试点:

  • 请检查您从哪个用户运行您的作业以及哪个用户在 Hadoop 集群上“可见”。
  • 请检查方法内部发生了什么以及正在修改哪些文件。
  • 确保满足所需的权限。如果没有,您可以选择禁用 HDFS 权限或“代表”自己作为 Hadoop 集群的不同用户。

这些问题可能对您有用:

于 2014-04-24T00:52:57.383 回答