2

我有一个由 cygnus 与 Cosmos 连接的 Orion Context Broker。

它工作正常,我的意思是我将新元素发送到 Context Broker 并且 cygnus 将它们发送到 Cosmos 并将它们保存在文件中。

我遇到的问题是当我尝试进行一些搜索时。

我启动 hive,我看到创建了一些与 cosmos 创建的文件相关的表,所以我启动了一些查询。

一个简单的工作正常:

select * from Table_name;

Hive 不会启动任何 mapReduce 作业。

但是当我想过滤、加入、计数或只获取一些字段时。这就是发生的事情:

Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
  set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
  set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
  set mapred.reduce.tasks=<number>
Starting Job = JOB_NAME, Tracking URL = JOB_DETAILS_URL
Kill Command = /usr/lib/hadoop-0.20/bin/hadoop job  -kill JOB_NAME
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2015-07-08 14:35:12,723 Stage-1 map = 0%,  reduce = 0%
2015-07-08 14:35:38,943 Stage-1 map = 100%,  reduce = 100%
Ended Job = JOB_NAME with errors
Error during job, obtaining debugging information...
Examining task ID: TASK_NAME (and more) from job JOB_NAME

Task with the most failures(4): 
-----
Task ID:
  task_201409031055_6337_m_000000

URL: TASK_DETAIL_URL
-----

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask
MapReduce Jobs Launched: 
Job 0: Map: 1  Reduce: 1   HDFS Read: 0 HDFS Write: 0 FAIL

我发现 Cygnus 创建的文件与其他文件有区别,因为在 cygnus 的情况下,它们必须用 jar 反序列化。

所以,我怀疑在这些情况下我是否必须应用任何 MapReduce 方法,或者是否已经有任何通用方法来执行此操作。

4

1 回答 1

0

在执行任何 Hive 语句之前,请执行以下操作:

hive> add jar /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar;

如果您通过 JDBC 使用 Hive,则执行与任何其他语句一样:

Connection con = ...
Statement stmt = con.createStatement();
stmt.executeQuery("add jar /usr/local/hive-0.9.0-shark-0.8.0-bin/lib/json-serde-1.1.9.3-SNAPSHOT.jar");
stmt.close();
stmt = con.createStatement();
ResultSet rs = stmt.executeQuery("select ...");
于 2015-07-09T10:12:31.283 回答