0

对象应实现Writable接口,以便在 Hadoop 中传输时进行序列化。以 LuceneScoreDoc类为例:

public class ScoreDoc implements java.io.Serializable {

  /** The score of this document for the query. */
  public float score;

  /** Expert: A hit document's number.
   * @see Searcher#doc(int) */
  public int doc;

  /** Only set by {@link TopDocs#merge} */
  public int shardIndex;

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score) {
    this(doc, score, -1);
  }

  /** Constructs a ScoreDoc. */
  public ScoreDoc(int doc, float score, int shardIndex) {
    this.doc = doc;
    this.score = score;
    this.shardIndex = shardIndex;
  }

  // A convenience method for debugging.
  @Override
  public String toString() {
    return "doc=" + doc + " score=" + score + " shardIndex=" + shardIndex;
  }
}

我应该如何用Writable接口序列化它?Writable和接口之间有什么联系java.io.serializable

4

2 回答 2

1

我认为篡改内置的 Lucene 类不是一个好主意。相反,拥有自己的类,该类可以包含 ScoreDoc 类型的字段,并在接口中实现 Hadoop 可写。它会是这样的:

public class MyScoreDoc implements Writable  {      

  private ScoreDoc sd;

  public void write(DataOutput out) throws IOException {
      String [] splits = sd.toString().split(" ");

      // get the score value from the string
      Float score = Float.parseFloat((splits[0].split("="))[1]);

      // do the same for doc and shardIndex fields
      // ....    

      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      float score = in.readInt();
      int doc = in.readInt();
      int shardIndex = in.readInt();

      sd = new ScoreDoc (score, doc, shardIndex);
  }

  //String toString()
}
于 2013-05-30T14:48:44.037 回答
0

首先看到Hadoop: Easy way to have object as output value without Writable interface you can use Java serialization OR

http://developer.yahoo.com/hadoop/tutorial/module5.html你需要自己做一个读写函数,很简单,里面可以调用API读写int、flaot、string等

您的 Writable 示例(需要导入)

public class ScoreDoc implements java.io.Serializable, Writable  {      
    /** The score of this document for the query. */
    public float score;//... as in above

  public void write(DataOutput out) throws IOException {
      out.writeInt(score);
      out.writeInt(doc);
      out.writeInt(shardIndex);
  }

  public void readFields(DataInput in) throws IOException {
      score = in.readInt();
      doc = in.readInt();
      shardIndex = in.readInt();    
  }

  //rest toStirng etc
}

注意:写入和读取的顺序应该相同,否则一个值会转到另一个值,如果您有不同的类型,读取时会出现序列化错误

于 2013-05-30T14:09:43.110 回答