java - Java jaxb utf-8/iso 转换

Question

我有一个包含非标准字符的 XML 文件（如奇怪的“引号”）。

我使用 UTF-8 / ISO / ascii + unmarshalled 读取了 XML：

BufferedReader br = new BufferedReader(new InputStreamReader(
                (conn.getInputStream()),"ISO-8859-1"));
        String output;
        StringBuffer sb = new StringBuffer();
        while ((output = br.readLine()) != null) {
            //fetch XML
            sb.append(output);
        }


        try {

            jc = JAXBContext.newInstance(ServiceResponse.class);

            Unmarshaller unmarshaller = jc.createUnmarshaller();

            ServiceResponse OWrsp =  (ServiceResponse) unmarshaller
                    .unmarshal(new InputSource(new StringReader(sb.toString())));

我有一个 oracle 函数，它将采用 iso-8859-1 代码，并将它们转换/映射为“文字”符号。即：“’”=>“左单引号”

JAXB 使用 iso 解组，显示带有 iso 转换的字符。即所有奇怪的单引号都将被编码为“’”

所以假设我的字符串是：10-11 岁的班级（注意奇怪的 - 在 11 到 11 岁之间）

jc = JAXBContext.newInstance(ScienceProductBuilderInfoType.class);
        Marshaller m = jc.createMarshaller();
        m.setProperty(Marshaller.JAXB_ENCODING, "ISO-8859-1");
        //save a temp file
        File file2 = new File("tmp.xml");

这将保存在文件中：

class of 10&#8211;11&#8208;year&#8208;olds. (what i want..so file saving works!)

[旁注：我已经使用 java 文件阅读器读取了文件，它输出了上面的字符串很好]

我遇到的问题是使用 jaxb unmarshaller 的字符串表示具有奇怪的输出，由于某种原因，我似乎无法让字符串表示 -。

当我 1：检查 xml 未编组的输出：

class of 10?11?year?olds

2：文件输出：

class of 10&#8211;11&#8208;year&#8208;olds

我什至尝试从保存的 XML 中读取文件，然后将其解组（希望在我的字符串中得到 -）

String sCurrentLine;
        BufferedReader br = new BufferedReader(new FileReader("tmp.xml"));
        StringBuffer sb = new StringBuffer();
        while ((sCurrentLine = br.readLine()) != null) {
            sb.append(sCurrentLine);
        }




        ScienceProductBuilderInfoType rsp =  (ScienceProductBuilderInfoType) unm
                .unmarshal(new InputSource(new StringReader(sb.toString())));

徒劳无功。

任何想法如何在 jaxb 中获取 iso-8859-1 编码字符？

score 0 · Accepted Answer

已解决：使用在 stackoverflow 上找到的这个 tibid 代码

final class HtmlEncoder {
  private HtmlEncoder() {}

  public static <T extends Appendable> T escapeNonLatin(CharSequence sequence,
      T out) throws java.io.IOException {
    for (int i = 0; i < sequence.length(); i++) {
      char ch = sequence.charAt(i);
      if (Character.UnicodeBlock.of(ch) == Character.UnicodeBlock.BASIC_LATIN) {
        out.append(ch);
      } else {
        int codepoint = Character.codePointAt(sequence, i);
        // handle supplementary range chars
        i += Character.charCount(codepoint) - 1;
        // emit entity
        out.append("&#x");
        out.append(Integer.toHexString(codepoint));
        out.append(";");
      }
    }
    return out;
  }
}

HtmlEncoder.escapeNonLatin(MYSTRING)

java - Java jaxb utf-8/iso 转换

1 回答 1

Related

Reference