2

我正在处理股票交易日志文件。每行表示具有 20 个制表符分隔值的贸易交易。我正在使用 hadoop 来处理这个文件并对交易进行一些基准测试。现在对于每一行,我必须执行单独的基准计算,因此在 map-reduce 中不需要 reduce 函数。为了执行每行的基准计算,我必须查询 Sybase 数据库以获取与该行对应的一些标准值。数据库根据每行的两个值进行索引 [交易 ID 和股票 ID]。现在我的问题是我应该在我的 mapreduce 程序中使用 tradeId 和 StockId 作为键,还是应该为我的键选择其他值/[值组合]。

4

1 回答 1

0

So, for each line of input, you're going to query a database and then perform benchmark calculations for each line separately. After you finish the benchmark calculations, you are going to output each line with the benchmark value.

In this case, you can either not use a reducer at all, or use an identity reducer.

So your map function will read in a line, then it will fire a query to the Sybase database for the standard values, and then perform benchmark calculations. Since you want to output each line with the benchmark value, you could have the Map function output the line as key and benchmark value as value, i.e <line, benchmark value>

Your map function would look something like this: (I'm assuming your benchmark value is an integer)

public void map(Text key, IntWritable value, Context context) throws Exception {
    String line = value.toString();   //this will be your key in the final output

     /* 
         Perform operations on the line

      */

      /* 

         standard values = <return value from sybase query.>;

      */

      /*Perform benchmark calculations and obtain benchmark values */

      context.write(line,benchmarkValue);     




}
于 2013-07-11T21:24:15.483 回答