hadoop - mapreduce ---自定义数据类型

Question

当我做一个mapreduce程序时，我遇到键是一个元组（A，B）（A和B都是整数集）。我该如何自定义这种数据类型？

public static class MapClass extends Mapper<Object,Text,Tuple,Tuple>....

public class Tuple implements WritableComparable<Tuple>{ 


        @Override
        public void readFields(DataInput arg0) throws IOException {
            // TODO Auto-generated method stub

        }

        @Override
        public void write(DataOutput arg0) throws IOException {
            // TODO Auto-generated method stub

        }

        @Override
        public int compareTo(Tuple o) {
            // TODO Auto-generated method stub
            return 0;
        }
    }

score 3 · Accepted Answer

快到了，只需为A和B添加变量，然后完成序列化方法和compareTo：

public class Tuple implements WritableComparable<Tuple>{ 
    public Set<Integer> a = new TreeSet<Integer>;
    public Set<Integer> b = new TreeSet<Integer>;

    @Override
    public void readFields(DataInput arg0) throws IOException {
        a.clear();
        b.clear();

        int count = arg0.readInt();
        while (count-- > 0) {
          a.add(arg0.readInt());
        }

        count = arg0.readInt();
        while (count-- > 0) {
          b.add(arg0.readInt());
        }
    }

    @Override
    public void write(DataOutput arg0) throws IOException {
        arg0.writeInt(a.size());
        for (int v : a) {
          arg0.writeInt(v);
        }
        arg0.writeInt(b.size());
        for (int v : b) {
          arg0.writeInt(v);
        }
    }

    @Override
    public int compareTo(Tuple o) {
        // you'll need to implement how you want to compare the two sets between objects
    }
}

score 1 · Accepted Answer

要在 hadoop 中实现自定义数据类型，您必须实现 WritableComparable 接口并为 readFields() write() 方法提供自定义实现。除了 readFiled 和 write 方法的实现必须覆盖 java 对象的 equals 和 hashcode 方法。

如果键的自定义数据类型实现必须实现可比较的接口。

hadoop - mapreduce ---自定义数据类型

2 回答 2

Related

Reference