1

I enabled the permission management in my hadoop cluster, but I'm facing a problem sending jobs with pig. This is the scenario:

1 - I have hadoop/hadoop user

2 - I have myuserapp/myuserapp user that runs PIG script.

3 - We setup the path /myapp to be owned by myuserapp

4 - We set pig.temp.dir to /myapp/pig/tmp

But when we pig try to run the jobs we got the following error:

job_201303221059_0009    all_actions,filtered,raw_data    DISTINCT    Message: Job failed! Error - Job initialization failed: org.apache.hadoop.security.AccessControlException: org.apache.hadoop.security.AccessControlException: Permission denied: user=realtime, access=EXECUTE, inode="system":hadoop:supergroup:rwx------

Hadoop jobtracker requires this permission to statup it's server.

My hadoop policy looks like:

<property>
<name>security.client.datanode.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
</property>
<property>
<name>security.inter.tracker.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
</property>
<property>
<name>security.job.submission.protocol.acl</name>
<value>hadoop,myuserapp supergroup,myuserapp</value>
<property>

My hdfs-site.xml:

<property>
<name>dfs.permissions</name>
<value>true</value>
</property>

<property>
 <name>dfs.datanode.data.dir.perm</name>
 <value>755</value>
</property>

<property>
 <name>dfs.web.ugi</name>
 <value>hadoop,supergroup</value>
</property>

My core site:

...
<property>
<name>hadoop.security.authorization</name>
<value>true</value>
</property>
...

And finally my mapred-site.xml

...
<property>
 <name>mapred.local.dir</name>
 <value>/tmp/mapred</value>
</property>

<property>
 <name>mapreduce.jobtracker.jobhistory.location</name>
 <value>/opt/logs/hadoop/history</value>
</property>

Is there a missing configuration? How can I deal with multiples users running jobs in a restrict HDFS cluster?

4

2 回答 2

1

您的问题可能是暂存目录。尝试将此属性添加到 mapred-site.xml:

<property>
    <name>mapreduce.jobtracker.staging.root.dir</name>
    <value>/user</value>
</property>

然后确保提交用户(例如'realtime')有一个主目录(例如'/user/realtime')并且他们拥有它。

于 2013-04-20T03:14:40.767 回答
0

公平调度程序旨在以用户身份运行 map reduce 作业,它为用户/组创建单独的池,但具有共享资源。乍一看,这个调度程序存在一些与某些目录权限相关的问题,不允许其他用户在作业运行所需的位置执行/写入。

因此,一种解决方案是使用容量调度程序:

<property>
 <name>mapred.jobtracker.taskScheduler</name>
 <value>org.apache.hadoop.mapred.CapacityTaskScheduler</value>
</property>

容量调度器,使用多个命名队列,其中每个队列都有可配置数量的 map 和 reduce 槽。容量的一个好处是能够限制每个用户运行任务的百分比,以便用户共享具有配额的集群。

于 2013-03-28T13:10:54.173 回答