0

我正在尝试在原始比较器中实现以下内容,但不确定如何编写?

这里的 tumestamp 字段属于 tyoe LongWritable。

if (this.getNaturalKey().compareTo(o.getNaturalKey()) != 0) {
                return this.getNaturalKey().compareTo(o.getNaturalKey());
            } else if (this.timeStamp != o.timeStamp) {
                return timeStamp.compareTo(o.timeStamp);
            } else {
                return 0;
            }

我在这里找到了一个提示,但不确定如何处理 LongWritabel 类型? http://my.safaribooksonline.com/book/databases/hadoop/9780596521974/serialization/id3548156

谢谢你的帮助

4

4 回答 4

1

假设我有一个表示一对(字符串 stockSymbol,长时间戳)的 CompositeKey。我们可以对 stockSymbol 字段进行一次分组传递,以将一种类型的所有数据放在一起,然后我们在 shuffle 阶段的“二次排序”使用时间戳长成员对时间序列点进行排序,以便它们到达减速器分区和排序。

public class CompositeKey implements WritableComparable<CompositeKey> {
    // natural key is (stockSymbol)
    // composite key is a pair (stockSymbol, timestamp)
    private String stockSymbol;
    private long timestamp;
......//Getter setter omiited for clarity here
@Override
    public void readFields(DataInput in) throws IOException {
        this.stockSymbol = in.readUTF();
        this.timestamp = in.readLong();
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(this.stockSymbol);
        out.writeLong(this.timestamp);
    }

    @Override
    public int compareTo(CompositeKey other) {
        if (this.stockSymbol.compareTo(other.stockSymbol) != 0) {
            return this.stockSymbol.compareTo(other.stockSymbol);
        } 
        else if (this.timestamp != other.timestamp) {
            return timestamp < other.timestamp ? -1 : 1;
        } 
        else {
            return 0;
        }

    }

现在 CompositeKey 比较器将是:

public class CompositeKeyComparator extends WritableComparator {

    protected CompositeKeyComparator() {
        super(CompositeKey.class, true);
    }

    @Override
    public int compare(WritableComparable wc1, WritableComparable wc2) {
        CompositeKey ck1 = (CompositeKey) wc1;
        CompositeKey ck2 = (CompositeKey) wc2;

        int comparison = ck1.getStockSymbol().compareTo(ck2.getStockSymbol());
        if (comparison == 0) {
            // stock symbols are equal here
            if (ck1.getTimestamp() == ck2.getTimestamp()) {
                return 0;
            }
            else if (ck1.getTimestamp() < ck2.getTimestamp()) {
                return -1;
            }
            else {
                return 1;
            }
        }
        else {
            return comparison;
        }
    }
}
于 2015-09-11T07:36:17.667 回答
0

The best way to correctly implement RawComparator is to extend WritableComparator and override compare() method. The WritableComparator is very good written, so you can easily understand it.

于 2012-10-11T07:08:38.450 回答
0

它已经从我在LongWritable课堂上看到的实现了:

/** A Comparator optimized for LongWritable. */ 
  public static class Comparator extends WritableComparator {
    public Comparator() {
      super(LongWritable.class);
    }

    public int compare(byte[] b1, int s1, int l1,
                       byte[] b2, int s2, int l2) {
      long thisValue = readLong(b1, s1);
      long thatValue = readLong(b2, s2);
      return (thisValue<thatValue ? -1 : (thisValue==thatValue ? 0 : 1));
    }
  }

该字节比较是RawComparator.

于 2012-10-11T07:22:26.870 回答
0

您是否在询问比较 hadoop 提供的 LongWritable 类型的方法?如果是,那么答案就是使用compare()方法。欲了解更多详情,请在此处向下滚动。

于 2012-10-11T05:31:49.760 回答