尝试使用impyla
.
这是我在 python 中编写的代码示例:
from impala.dbapi import connect
targets = ... # targets is a dictionary of objects of a specific class
yesterday = datetime.date.today() - datetime.timedelta(days=1)
log_datetime = datetime.datetime.now()
query = """
INSERT INTO my_database.mytable
PARTITION (year={year}, month={month}, day={day})
VALUES ('{yesterday}', '{log_ts}', %s, %s, %s, 1, 1)
""".format(yesterday=yesterday, log_ts=log_datetime,
year=yesterday.year, month=yesterday.month,
day=yesterday.day)
print(query)
rows = tuple([tuple([i.campaign, i.adgroup, i.adwordsid])
for i in targets.values()])
connection = connect(host=os.environ["HADOOP_IP"],
port=10000,
user=os.environ["HADOOP_USER"],
password=os.environ["HADOOP_PASSWD"],
auth_mechanism="PLAIN")
cursor = connection.cursor()
cursor.execute("SET hive.exec.dynamic.partition.mode=nonstrict")
cursor.executemany(query, rows)
有趣的是,即使我正在启动一个executemany
命令impyla
,仍然会将其解析为多个 MapReduce 作业。事实上,我可以看到与我传递给impyla.executemany
方法的元组对象的元组中包含的元组一样多的 MapReduce 作业启动。
你看有什么不对吗?一个多小时后给你一个想法,它只写了 350 行。