java - 在 hadoop/cascading 中处理 UTF-16LE 编码文件

Question

我需要在 hadoop 之上处理级联的 UTF-16LE 编码文件。我尝试了以下方法，但这些方法都不起作用。

由于 NullPointerException 为 mapred-site.xml中-Xmx1024m -Dfile.encoding=UTF-16LE的属性赋值失败：但此方法适用于 UTF-8。hadoop 是否无法处理 UTF-16 数据？mapreduce.map.java.optscom.google.common.base.Preconditions.checkNotNull(Preconditions.java:187)
在代码中做System.setProperty("file.encoding", "UTF-16LE");也无法解析数据
覆盖 Cascading 的 TextDelimited 类的字符集也无法处理数据

但是，使用 BufferedReader 在 UTF-16LE 中读取它可以正确解析数据。

请帮忙

提前致谢

score 0 · Accepted Answer

0

在某处发现：Hadoop 不支持 UTF-16 文件

于 2018-05-03T02:59:19.093 回答

1 回答 1