python - 在 CPython 中嵌入 Pig

Question

有谁知道在 cpython 脚本中嵌入 pig 的方法，类似于 RDBMS 可用的方法？我搜索了，但没有运气。

我宁愿不使用 Jython，因为我正在尝试使用 jython 中不可用的各种 cpython 库来处理数据。

score 1 · Accepted Answer

If by "similar to what is available for RDBMS" you mean an API, you could build out an object model using subprocess. I have used something like the following in the past.

import subprocess
from subprocess import Popen, PIPE

def execute(command):
    print command + "\n"
    p = subprocess.Popen(command, stdout=subprocess.PIPE, shell=True)
    stdout, stderr = p.communicate()
    print stdout
    return p.returncode

command = "pig.9 -p input=" + input + "/* -p output=" + output + " -f my.pig"
execute(command)

score 1 · Accepted Answer

最近在 Pig 0.12 中添加了对 CPython 的支持：http: //blog.mortardata.com/post/62334142398/hadoop-python-pig-trunk

score 1 · Accepted Answer

Jython 似乎是最受欢迎的选项，例如此处、此处和此处，但您可能会发现此线程很有帮助，尽管它也专注于 Jython。似乎通过 Python 对 UDF 的关注显然是在 Jython 上，所以除非您绝对需要 CPython 库，否则您可以考虑硬着头皮继续使用它。另一件需要考虑的事情是，Jython 2.7 版（来源）已经成熟，尽管这对于您的需求可能不切实际。

python - 在 CPython 中嵌入 Pig

3 回答 3

Related

Reference