hadoop - 为 Hadoop MR 创建序列文件格式

Question

我正在与合作Hadoop MapRedue，并有一个问题。目前，我的映射器input KV type是LongWritable, LongWritable type并且 output KV type也是LongWritable, LongWritable type。InputFileFormat 是 SequenceFileInputFormat。基本上我想要做的是将一个 txt 文件更改为 SequenceFileFormat，以便我可以将它用于我的映射器。

我想做的是

输入文件是这样的

1\t2 (key = 1, value = 2)

2\t3 (key = 2, value = 3)

不断...

我查看了这个线程如何将 .txt 文件转换为 Hadoop 的序列文件格式，但TextInputFormat仅支持Key = LongWritable and Value = Text

有没有办法获取txt并在其中制作序列文件KV = LongWritable, LongWritable？

score 7 · Accepted Answer

当然，基本上与我在您链接的另一个线程中所说的方式相同。但是你必须实现你自己的Mapper.

只是给你一个快速的划痕：

public class LongLongMapper extends
    Mapper<LongWritable, Text, LongWritable, LongWritable> {

  @Override
  protected void map(LongWritable key, Text value,
      Mapper<LongWritable, Text, LongWritable, LongWritable>.Context context)
      throws IOException, InterruptedException {

    // assuming that your line contains key and value separated by \t
    String[] split = value.toString().split("\t");

    context.write(new LongWritable(Long.valueOf(split[0])), new LongWritable(
        Long.valueOf(split[1])));

  }

  public static void main(String[] args) throws IOException,
      InterruptedException, ClassNotFoundException {

    Configuration conf = new Configuration();
    Job job = new Job(conf);
    job.setJobName("Convert Text");
    job.setJarByClass(LongLongMapper.class);

    job.setMapperClass(Mapper.class);
    job.setReducerClass(Reducer.class);

    // increase if you need sorting or a special number of files
    job.setNumReduceTasks(0);

    job.setOutputKeyClass(LongWritable.class);
    job.setOutputValueClass(LongWritable.class);

    job.setOutputFormatClass(SequenceFileOutputFormat.class);
    job.setInputFormatClass(TextInputFormat.class);

    FileInputFormat.addInputPath(job, new Path("/input"));
    FileOutputFormat.setOutputPath(job, new Path("/output"));

    // submit and wait for completion
    job.waitForCompletion(true);
  }
}

映射器函数中的每个值都会得到一行输入，因此我们只是通过分隔符（制表符）将其拆分并将其每个部分解析为长整数。

就是这样。

hadoop - 为 Hadoop MR 创建序列文件格式

1 回答 1

Related

Reference