java - 运行基本 Hadoop 代码时出错

Question

我正在运行一个在作业中有一个分区器类的 hadoop 代码。但是，当我运行命令时

hadoop jar Sort.jar SecondarySort inputdir outputdir

我收到一个运行时错误，上面写着

class KeyPartitioner not org.apache.hadoop.mapred.Partitioner.

我已确保 KeyPartitioner 类扩展了 Partitioner 类，但为什么会出现此错误？

下面是驱动代码：

JobConf conf = new JobConf(getConf(), SecondarySort.class);
    conf.setJobName(SecondarySort.class.getName());

    conf.setJarByClass(SecondarySort.class);

    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    conf.setMapOutputKeyClass(StockKey.class);
    conf.setMapOutputValueClass(Text.class);

    conf.setPartitionerClass((Class<? extends Partitioner<StockKey, DoubleWritable>>) KeyPartitioner.class);

    conf.setMapperClass((Class<? extends Mapper<LongWritable, Text, StockKey, DoubleWritable>>) StockMapper.class);
    conf.setReducerClass((Class<? extends Reducer<StockKey, DoubleWritable, Text, Text>>) StockReducer.class);

这是分区器类的代码：

public class KeyPartitioner extends Partitioner<StockKey, Text> {

@Override
public int getPartition(StockKey arg0, Text arg1, int arg2) {

    int partition = arg0.name.hashCode() % arg2;

    return partition;
}
}

score 1 · Accepted Answer

请注意，hadoop 中有两个分区器：

org.apache.hadoop.mapreduce.Partitioner
org.apache.hadoop.mapred.Partitioner

确保您的KeyPartitioner类实现了第二个接口，而不是第一个抽象类。

编辑：您必须设置输入和输出文件夹：

FileInputFormat.addInputPath(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

java - 运行基本 Hadoop 代码时出错

1 回答 1

Related

Reference