java - 数据输入流和 UTF-8

Question

我是一个新程序员，我正在处理的代码有几个问题。

基本上，代码所做的是从另一个 JSP 接收表单，读取字节，解析数据，然后使用 DataInputStream 将结果提交给 SalesForce。

   //Getting the parameters from request
 String contentType = request.getContentType();
 DataInputStream in = new DataInputStream(request.getInputStream());
 int formDataLength = request.getContentLength();

 //System.out.println(formDataLength);
 byte dataBytes[] = new byte[formDataLength];
 int byteRead = 0;
 int totalBytesRead = 0;
 while (totalBytesRead < formDataLength) 
 {
  byteRead = in.read(dataBytes, totalBytesRead, formDataLength);
  totalBytesRead += byteRead;
 }

它工作正常，但前提是代码处理普通字符。每当它尝试处理特殊字符（如法语字符：àâäæçéèêëîïôùûü）时，我都会得到以下乱码：

Ã Ã¢Ã¤Ã¦Ã§Ã©Ã¨ÃªÃ«Ã®Ã¯Ã´Ã¹Ã»Ã¼

我知道这可能是 DataInputStream 的问题，以及它如何不返回 UTF-8 编码的文本。你们对如何解决这个问题有什么建议吗？

所有 .jsp 文件都包含 <%@page pageEncoding="UTF-8" contentType="text/html; charset=UTF-8"%> 并且 Tomcat 的设置很好（URI = UTF-8 等）。我尝试添加：

request.setCharacterEncoding("UTF-8");

和

response.setCharacterEncoding("UTF-8");

无济于事。

这是它如何解析数据的示例：

    //Getting the notes for the Case 
 String notes = new String(dataBytes);
 System.out.println(notes);
 String savenotes = casetype.substring(notes.indexOf("notes"));
 //savenotes = savenotes.substring(savenotes.indexOf("\n"), savenotes.indexOf("---"));
 savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
 savenotes = savenotes.substring(savenotes.indexOf("\n")+1);
 savenotes = savenotes.substring(0,savenotes.indexOf("name=\"datafile"));
 savenotes = savenotes.substring(0,savenotes.lastIndexOf("\n------"));
 savenotes = savenotes.trim();

提前致谢。

score 7 · Accepted Answer

问题不在于输入流，因为它们不处理字符，而只处理字节。您的问题在于您将这些字节转换为字符。在这种特殊情况下，您需要在String构造函数中指定正确的编码。

String notes = new String(dataBytes, "UTF-8");

也可以看看：

Unicode - 如何正确获取字符？

顺便说一句，DataInputStream在特定的代码片段中没有额外的价值。你可以保留它InputStream。

java - 数据输入流和 UTF-8

1 回答 1

也可以看看：

Related

Reference