IMO,您不能在 hadoop 配置中使用字节。
但是我们可以注意到由 org.apache.hadoop.hbase.mapreduce.ImportTsv.SEPARATOR_CONF_KEY 定义的属性“importtsv.separator”是 org.apache.hadoop.hbase.mapreduce.ImportTsv:245 中的 Base64 编码
public static Job createSubmittableJob(Configuration conf, String[] args)
throws IOException, ClassNotFoundException {
// Support non-XML supported characters
// by re-encoding the passed separator as a Base64 string.
String actualSeparator = conf.get(SEPARATOR_CONF_KEY);
if (actualSeparator != null) {
conf.set(SEPARATOR_CONF_KEY,
Base64.encodeBytes(actualSeparator.getBytes()));
}
...
}
在 org.apache.hadoop.hbase.mapreduce.ImportTsv:92 中解码
protected void doSetup(Context context) {
Configuration conf = context.getConfiguration();
// If a custom separator has been used,
// decode it back from Base64 encoding.
separator = conf.get(ImportTsv.SEPARATOR_CONF_KEY);
if (separator == null) {
separator = ImportTsv.DEFAULT_SEPARATOR;
} else {
separator = new String(Base64.decode(separator));
}
...
}
最后检查为 org.apache.hadoop.hbase.mapreduce.ImportTsv:97 中的单个字节
public TsvParser(String columnsSpecification, String separatorStr) {
// Configure separator
byte[] separator = Bytes.toBytes(separatorStr);
Preconditions.checkArgument(separator.length == 1,
"TsvParser only supports single-byte separators");
separatorByte = separator[0];
...
}
作为一种解决方案,我建议您重新声明一个 main 方法,该方法在执行之前修改配置的属性。
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.hbase.HBaseConfiguration;
import org.apache.hadoop.hbase.mapreduce.ImportTsv;
public class ImportTsvByteSeparator extends ImportTsv
{
/**
* Main entry point.
*
* @param args The command line parameters.
* @throws Exception When running the job fails.
*/
public static void main(String[] args) throws Exception {
// We just have to modify the configuration
Configuration conf = HBaseConfiguration.create();
int byteSeparator = conf.getInt("importtsv.byte_separator", 001);
String separator = Character.toString((char) byteSeparator);
conf.set("importtsv.separator", separator);
// Now we call ImportTsv main's method
ImportTsv.main(args);
}
}
由于属性的可见性,我认为我们不能覆盖流程中的某些方法(如 createSubmittableJob())。