0

使用 Azure HDInsights 和 Hive/Python 运行一个非常简单的测试示例。Hive 似乎没有加载 Python 脚本。

  • Hive 包含一个小型测试表,其中包含一个名为“dob”的字段,我正在尝试通过 map-reduce 使用 Python 脚本对其进行转换。
  • Python 脚本为空白,位于 asv:///mapper_test.py。我将脚本设为空白,因为我想首先隔离 Hive 访问此脚本的问题。

蜂巢代码:

ADD FILE asv:///mapper_test.py;
SELECT
TRANSFORM (dob)
USING 'python asv:///mapper_test.py' AS (dob)
FROM test_table;

错误:

Hive history file=c:\apps\dist\hive-0.9.0\logs/hive_job_log_RD00155DD090CC$_201308202117_1738335083.txt
Logging initialized using configuration in file:/C:/apps/dist/hive-0.9.0/conf/hive-log4j.properties
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_201308201542_0025, Tracking URL = http://jobtrackerhost:50030/jobdetails.jsp?jobid=job_201308201542_0025
Kill Command = c:\apps\dist\hadoop-1.1.0-SNAPSHOT\bin\hadoop.cmd job -Dmapred.job.tracker=jobtrackerhost:9010 -kill job_201308201542_0025
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2013-08-20 21:18:04,911 Stage-1 map = 0%, reduce = 0%
2013-08-20 21:19:05,175 Stage-1 map = 0%, reduce = 0%
2013-08-20 21:19:32,292 Stage-1 map = 100%, reduce = 100%
Ended Job = job_201308201542_0025 with errors
Error during job, obtaining debugging information...
Examining task ID: task_201308201542_0025_m_000002 (and more) from job job_201308201542_0025
Exception in thread "Thread-24" java.lang.RuntimeException: Error while reading from   task log urlatorg.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:242)
at org.apache.hadoop.hive.ql.exec.JobDebugger.showJobFailDebugInfo(JobDebugger.java:227)
at org.apache.hadoop.hive.ql.exec.JobDebugger.run(JobDebugger.java:92)
at java.lang.Thread.run(Thread.java:722)
Caused by: java.io.IOException: Server returned HTTP response code: 400 for URL: http://workernode1:50060/tasklog?taskid=attempt_201308201542_0025_m_000000_7&start=-8193
at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1616)
at java.net.URL.openStream(URL.java:1035)
at org.apache.hadoop.hive.ql.exec.errors.TaskLogProcessor.getStackTraces(TaskLogProcessor.java:193)
... 3 more
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1 HDFS Read: 0 HDFS Write: 0 FAIL
Total MapReduce CPU Time Spent: 0 msec
4

1 回答 1

1

在 Azure HDInsight 版本 2.0 中使用 PIG + Python 时,我也有类似的经历。我发现的一件事是,Python 仅在头节点中可用,而不是在集群中的所有节点中可用。你可以在这里看到一个类似的问题

可以远程登录到集群的头节点,从头节点中找出Task Tracker节点的IP,远程登录任意一个Task Tracker节点,查看该节点是否安装了python。

此问题已在 HDInsight 版本 2.1 群集中修复。但是 Python 仍然没有添加到“PATH”中。您可能需要自己执行此操作。

于 2013-11-23T07:11:32.077 回答