我收到一个带有 shiftJis 编码的文件。它在每个多字节字符串的开头和结尾都有带有移入和移出字符的日文字符。
根据我的要求,我必须将此文件转换为 utf-8 并从 utf-8 文件中删除 SI 和 SO 字符?做这个的最好方式是什么?我应该在 utf-8 转换之前还是之后删除它们?以及如何删除它?提前致谢。
我的javacode如下
public static void main(String[] args) throws Exception {
// TODO Auto-generated method stub
String inFilePath = "src\\encoding\\input\\dfd02.PGP_dec";
String filePath = "src\\encoding\\output\\";
String utf8FileNm = "utf8-out.txt";
String charsetName = "x-SJIS_0213";
InputStream in;
try {
in = new FileInputStream(inFilePath);
Reader reader = new InputStreamReader(in, charsetName);
StringBuilder sb = new StringBuilder();
int read;
while ((read = reader.read()) != -1){
sb.append((char)read);
}
reader.close();
String string = sb.toString();
OutputStream out = new FileOutputStream(filePath + charsetName + "-" + utf8FileNm);
Writer writer = new OutputStreamWriter(out, "UTF-8");
writer.write(string);
writer.close();
System.out.println("Finished writing the input file in UTF-8 format");
} catch (FileNotFoundException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
}