hadoop - best way to deal with byte array in MR job

Question

i need to compare byte array in the comparator for MR job, but cannot find a good way to handle the byte array, the object that are serialized/deserialized has following fields:

public class GeneralKey {
  String name;
  String type;
  ...other String fields ..
}

@Override 
public void readFields(DataInput input) throw IOException {
  name = input.readUTF();
  type = input.readUTF();
  ...
}

@Override
public void write(DataOutput output) throws IOException {
  output.writeUTF(name);
  output.writeUTF(type);
  ...
}

the serialized byte array looks this: name: [0,0] 2 byte, this 2 byte represent the length of the name, since it's 0, name is empty type: [0,3,96,97,98] 5 byte, the first 2 bytes are the length of type, means value of type is 3 bytes long, so need to read the following 3 bytes:96,97,98, which is 'abc' in string.

wonder if there is better way to deal with byte array that can read the first two bytes as integer, then can decided how many byte to read next to convert them to String. i use hadoop 1.0.3 and run the job in AWS, i tried hbase's Bytes class but for some reason it threw me Class not found error java.lang.ClassNotFoundException: org.apache.hadoop.hbase.util.Bytes

if there other library that i can use to handle byte array easily? thanks

score 0 · Accepted Answer

0

我使用字节数组作为键和值，但使用了以下内置类型： BytesWritable

于 2013-08-30T08:12:10.913 回答

hadoop - best way to deal with byte array in MR job

1 回答 1

Related

Reference