3

当我做一个mapreduce程序时,我遇到键是一个元组(A,B)(A和B都是整数集)。我该如何自定义这种数据类型?

public static class MapClass extends Mapper<Object,Text,Tuple,Tuple>....

public class Tuple implements WritableComparable<Tuple>{ 


        @Override
        public void readFields(DataInput arg0) throws IOException {
            // TODO Auto-generated method stub

        }

        @Override
        public void write(DataOutput arg0) throws IOException {
            // TODO Auto-generated method stub

        }

        @Override
        public int compareTo(Tuple o) {
            // TODO Auto-generated method stub
            return 0;
        }
    }
4

2 回答 2

3

快到了,只需为A和B添加变量,然后完成序列化方法和compareTo:

public class Tuple implements WritableComparable<Tuple>{ 
    public Set<Integer> a = new TreeSet<Integer>;
    public Set<Integer> b = new TreeSet<Integer>;

    @Override
    public void readFields(DataInput arg0) throws IOException {
        a.clear();
        b.clear();

        int count = arg0.readInt();
        while (count-- > 0) {
          a.add(arg0.readInt());
        }

        count = arg0.readInt();
        while (count-- > 0) {
          b.add(arg0.readInt());
        }
    }

    @Override
    public void write(DataOutput arg0) throws IOException {
        arg0.writeInt(a.size());
        for (int v : a) {
          arg0.writeInt(v);
        }
        arg0.writeInt(b.size());
        for (int v : b) {
          arg0.writeInt(v);
        }
    }

    @Override
    public int compareTo(Tuple o) {
        // you'll need to implement how you want to compare the two sets between objects
    }
}
于 2013-03-23T12:19:55.830 回答
1

要在 hadoop 中实现自定义数据类型,您必须实现 WritableComparable 接口并为 readFields() write() 方法提供自定义实现。除了 readFiled 和 write 方法的实现必须覆盖 java 对象的 equals 和 hashcode 方法。

如果键的自定义数据类型实现必须实现可比较的接口。

于 2013-09-28T05:36:52.200 回答