我有一个由 cygnus 与 Cosmos 连接的 Orion Context Broker。
它工作正常,我的意思是我将新元素发送到 Context Broker 并且 cygnus 将它们发送到 Cosmos 并将它们保存在文件中。
我遇到的问题是当我尝试进行一些搜索时。
我启动 hive,我看到创建了一些与 cosmos 创建的文件相关的表,所以我启动了一些查询。
一个简单的工作正常:
select * from Table_name;
Hive 不会启动任何 mapReduce 作业。
但是当我想过滤、加入、计数或只获取一些字段时。这就是发生的事情:
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapred.reduce.tasks=<number>
Starting Job = JOB_NAME, Tracking URL = JOB_DETAILS_URL
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job -kill JOB_NAME
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-07-08 14:35:12,723 Stage-1 map = 0%, reduce = 0%
2015-07-08 14:35:38,943 Stage-1 map = 100%, reduce = 100%
Ended Job = JOB_NAME with errors
Error during job, obtaining debugging information...
Examining task ID: TASK_NAME (and more) from job JOB_NAME
Task with the most failures(4):
-----
Task ID:
task_201409031055_6337_m_000000
URL: TASK_DETAIL_URL
-----
FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched:
Job 0: Map: 1 Reduce: 1 HDFS Read: 0 HDFS Write: 0 FAIL
我发现 Cygnus 创建的文件与其他文件有区别,因为在 cygnus 的情况下,它们必须用 jar 反序列化。
所以,我怀疑在这些情况下我是否必须应用任何 MapReduce 方法,或者是否已经有任何通用方法来执行此操作。