hadoop - 使用 HCatalog 对 Hive 表进行 MapReduce

Question

我正在尝试编写计算 Hive 表（Hadoop 2.2.0.2.0.6.0-101）中字段值分布的 map-reduce 作业。例如：

输入 Hive 表“ATable”：

+------+--------+
! name | rating |   |
+------+--------+
| Bond |  7     |
| Megre|  2     |
! Holms|  11    |
| Puaro|  7     |
! Holms|  1     |
| Puaro|  7     |
| Megre|  2     |      
| Puaro|  7     |
+------+--------+

Map-reduce 作业也应在 Hive 中生成以下输出表：

+--------+-------+--------+
| Field  | Value |  Count |
+--------+-------+--------+
| name   | Bond  |   1    |
| name   | Puaro |   3    |
| name   | Megre |   2    |
| name   | Holms |   1    |
| rating | 7     |   4    |
| rating | 11    |   1    |
| rating | 1     |   1    |
| rating | 2     |   2    |
+--------+-------+--------+

要获取字段名称/值，我需要访问 HCatalog 元数据，因此我可以在 map 方法 (org.apache.hadoop.mapreduce.Mapper) 中使用它们为此我尝试采用来自： http://java 的示例。 dzone.com/articles/mapreduce-hive-tables-using

此示例中的代码可以编译，但会产生很多弃用警告：

protected void map(WritableComparable key, HCatRecord value,
 org.apache.hadoop.mapreduce.Mapper.Context context)
 throws IOException, InterruptedException {

 // Get table schema
 HCatSchema schema = HCatBaseInputFormat.getTableSchema(context);

 Integer year = new Integer(value.getString("year", schema));
 Integer month = new Integer(value.getString("month", schema));
 Integer DayofMonth = value.getInteger("dayofmonth", schema);

 context.write(new IntWritable(month), new IntWritable(DayofMonth));
}

弃用警告：

HCatRecord
HCatSchema 
HCatBaseInputFormat.getTableSchema

在哪里可以找到在 map-reduce 中使用 HCatalog 和最新的、未弃用的接口的类似示例？

谢谢！

score 0 · Accepted Answer

我使用了Cloudera 示例之一中给出的示例，并使用此博客中给出的框架来编译我的代码。我还必须在 pom.xml 中为 hcatalog 添加 maven repo。此示例使用新的 mapreduce API，而不是已弃用的 mapred API。希望能帮助到你。

        <dependency>
        <groupId>org.apache.hcatalog</groupId>
        <artifactId>hcatalog-core</artifactId>
        <version>0.11.0</version>
        </dependency>

hadoop - 使用 HCatalog 对 Hive 表进行 MapReduce

1 回答 1

Related

Reference