java - 将 pdf 文件编码为 JSON 字符串时出错

Question

我想读取 pdf 文件的内容并以 JSON 字符串将其发送到服务器。我使用 google guava 库将 pdf 文件的内容读入字符串。然后我使用了 jettison JSON 库来转义会与 JSON 冲突的所需字符。

String content = Files.toString(new File("C:/Users/Sudhagar/Desktop/GAME.pdf"), Charset.defaultCharset());

String escapedContent = org.codehaus.jettison.json.JSONObject.quote(content);

我将 JVM 的默认字符集设置为 UTF-8。

生成的 JSON 字符串创建如下，

String respStr = "{\n";
respStr = respStr + "\"mimetype\" : \"" + "text/plain" + "\",\n";
respStr = respStr + "\"value\" : " + escapedContent + "\n";
respStr = respStr + "}\n";
System.out.println(respStr);
StringEntity entity = new StringEntity(respStr);
httpput.setEntity(entity);

当我将此 JSON 发送到服务器时，我得到一个异常，

org.codehaus.jackson.JsonParseException: Invalid UTF-8 middle byte 0xfc  at [Source: [B@5733c2; line: 3, column: 25]

我想知道这种方法或针对此问题的任何其他方法是否有任何错误。

score 3 · Accepted Answer

我认为 PDF 文件应该被视为不透明的二进制数据，就像图像或加密数据一样。

不要把它当作纯文本文件来阅读。像对待其他二进制数据一样对待它——这可能意味着为了 JSON 对它进行 base64 编码。

java - 将 pdf 文件编码为 JSON 字符串时出错

1 回答 1

Related

Reference