2

我正在尝试在我在shell 操作中调用subprocesspython脚本中使用。应该读取存储在 Hadoop 的 HDFS 中的文件。oozieSubprocess

我在伪分布式模式下使用 hadoop-1.2.1 和 oozie-3.3.2。

这是python脚本,名为connected_subprocess.py

#!/usr/bin/python

import subprocess
import networkx as nx

liste=subprocess.check_output("hadoop fs -cat /user/root/output-data/calcul-proba/final.txt",shell=True).split('\n')
G=nx.DiGraph()
f=open("/home/rlk/liste_strongly_connected.txt","wb")
for item in liste:
    try:
        app1,app2=item.split('\t')
        G.add_edge(app1,app2)
    except:
        pass
liste_connected=nx.strongly_connected_components(G)
for item in liste_connected:
    if len(item)>1:
        f.write('{}\n'.format('\t'.join(item)))
f.close()

Oozie 的 workflow.xml 中对应的 shell 动作如下:

 <action name="final">
        <shell xmlns="uri:oozie:shell-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <configuration>
                <property>
                    <name>mapred.job.queue.name</name>
                    <value>${queueName}</value>
                </property>
            </configuration>
            <exec>connected_subprocess.py</exec>
            <file>connected_subprocess.py</file>
         </shell>
         <ok to="end" />
         <error to="kill" />
    </action>

当我运行 oozie 作业时,tasktracker 日志会读取这些错误:

Error: Could not find or load main class org.apache.hadoop.fs.FsShell
Traceback (most recent call last):
  File "./connected_subprocess.py", line 6, in <module>
    liste=subprocess.check_output("hadoop fs -cat /user/root/output-data/calcul-proba/final.txt",shell=True).split('\n')
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command 'hadoop fs -cat /user/root/output-data/calcul-proba/final.txt' returned non-zero exit status 1
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.ShellMain], exit code [1]

当 python 脚本嵌入到 oozie 操作中时,我似乎无法在我的 python 脚本中运行 shell 命令行,因为当我在交互式 shell 中运行我的 python 脚本时一切正常。

有什么办法可以绕过这个限制吗?

4

1 回答 1

4

I wonder if your script just doesn't have access to your PATH environment variable (when executed through Oozie) and is having trouble locating the "hadoop" command. Can you try modifying your python script's subprocess.check_output call and adding the full path to the hadoop fs command?

于 2013-09-03T19:28:28.180 回答