我对 Hadoop 中枚举的常规和首选解决方案是通过它们的序数值对枚举进行序列化。
public class EnumWritable implements Writable {
static enum EnumName {
ENUM_1, ENUM_2, ENUM_3
}
private int enumOrdinal;
// never forget your default constructor in Hadoop Writables
public EnumWritable() {
}
public EnumWritable(Enum<?> arbitraryEnum) {
this.enumOrdinal = arbitraryEnum.ordinal();
}
public int getEnumOrdinal() {
return enumOrdinal;
}
@Override
public void readFields(DataInput in) throws IOException {
enumOrdinal = in.readInt();
}
@Override
public void write(DataOutput out) throws IOException {
out.writeInt(enumOrdinal);
}
public static void main(String[] args) {
// use it like this:
EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
// let Hadoop do the write and read stuff
EnumName yourDeserializedEnum = EnumName.values()[enumWritable.getEnumOrdinal()];
}
}
显然它有缺点:序数可以改变,所以如果你交换ENUM_2
并ENUM_3
读取一个以前序列化的文件,这将返回另一个错误的枚举。
因此,如果您事先知道枚举类,则可以编写枚举的名称并像这样使用它:
enumInstance = EnumName.valueOf(in.readUTF());
这将使用稍微多一点的空间,但更节省对枚举名称的更改。
完整的示例如下所示:
public class EnumWritable implements Writable {
static enum EnumName {
ENUM_1, ENUM_2, ENUM_3
}
private EnumName enumInstance;
// never forget your default constructor in Hadoop Writables
public EnumWritable() {
}
public EnumWritable(EnumName e) {
this.enumInstance = e;
}
public EnumName getEnum() {
return enumInstance;
}
@Override
public void write(DataOutput out) throws IOException {
out.writeUTF(enumInstance.name());
}
@Override
public void readFields(DataInput in) throws IOException {
enumInstance = EnumName.valueOf(in.readUTF());
}
public static void main(String[] args) {
// use it like this:
EnumWritable enumWritable = new EnumWritable(EnumName.ENUM_1);
// let Hadoop do the write and read stuff
EnumName yourDeserializedEnum = enumWritable.getEnum();
}
}