java - Hadoop（Yarn）：设置映射器输入分隔符？

Question

我希望能够为我收到的键/值对设置不同的分隔符到我的 MR 工作的 map 函数中。

例如我的文本文件可能有：

John-23
Mary-45
Scott-13

在我的地图函数中，我希望每个元素的键是 John，值是 23 等。

然后，如果我使用设置输出分隔符

conf.set("mapreduce.textoutputformat.separator", "-");

减速器会拿起钥匙直到第一个'-'和之后的所有值吗？还是我也需要对减速器进行更改？

谢谢

score 1 · Accepted Answer

阅读

如果您使用org.apache.hadoop.mapreduce.lib.input.TextInputFormat，您可以简单地String#split在Mapper.

 @Override
 public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

     String[] keyValue = value.toString().split("-");
     // would emit John -> 23 as a text
     context.write(new Text(keyValue[0]), new Text(keyValue[1]));
 }

写作

如果您以这种方式输出它：

Text key = new Text("John");
LongWritable value = new LongWritable(23);
// of course key and value can come from the reduce method itself,
// I just want to illustrate the types
context.write(key, value);

是的，TextOutputFormat负责以您想要的格式编写：

John-23

我在 Hadoop 2.x (YARN) 中遇到并已在此处回答的唯一陷阱是该属性已重命名为mapreduce.output.textoutputformat.separator.

java - Hadoop（Yarn）：设置映射器输入分隔符？

1 回答 1

Related

Reference