python - 从 Python 执行 Hive 脚本时的参数替换

Question

我必须从 Python 对 Hive 执行以下查询：

SELECT * FROM user WHERE age > ${hiveconf:AGE}

至于现在我有以下工作代码片段：

import pyhs2
with pyhs2.connect(host='localhost',
                   port=60850,
                   authMechanism="PLAIN",
                   user='hduser',
                   database='default') as conn:
    with conn.cursor() as cur:
        cur.execute("SELECT * FRPM user WHERE age > ?", 10)

所以我可以使用 PyHs2 将参数传递给查询。但是我怎样才能从 Python 代码中执行变量替换而不更改原始查询（即${hiveconf:AGE}以干净的方式替换为某个值）？

score 2 · Accepted Answer

像这样的东西？：

def get_sql(substitution="${hiveconf:AGE}"):
    sql = "select * from bla where blub > {variable}"
    sql = sql.format(variable=substitution)
    return sql

结果：

get_sql()
"select * from bla where blub > ${hiveconf:AGE}"

get_sql("test")
"select * from bla where blub > test"

有关格式语法的更多详细信息，请参见此处：https ://docs.python.org/2/library/string.html#format-string-syntax

score 1 · Accepted Answer

你可以在 python 中使用子进程。您可以将 sql 存储在单独的文件中并使用以下格式执行它。您也可以添加更多变量

 import subprocess
 value1=your_value
 p=subprocess.Popen("hive -f /sql/file/location/script.hql"+" --hiveconf variable1="+value1,shell=True,
                 stdout=subprocess.PIPE,
                 stderr=subprocess.PIPE)
 out, err = p.communicate()

 if err==None:
    print "successfull"
else:
    print "not successfull"

或者，如果您想执行它，下面的 pyhs2 方式是您执行语句的格式。

 cur.execute("SELECT * FROM user WHERE age > %d"% 10)

python - 从 Python 执行 Hive 脚本时的参数替换

2 回答 2

Related

Reference