java - Jena 结果为 UTF-8 格式

Question

如何获得 UTF-8 格式的 Jena（Java 语言）结果？我的代码：

Query query= QueryFactory.create(queryString);
QueryExecution qexec= QueryExecutionFactory.sparqlService("http://lod.openlinksw.com/sparql", queryString);
ResultSet results = qexec.execSelect();
List<QuerySolution> list = ResultSetFormatter.toList(results);  
System.out.println(list.get(i).get("churchname"));

score 5 · Accepted Answer

我认为这与SPARQL 中的 UTF-8 格式有关？

看了这里是怎么回事：

进口商采用 utf-8 编码的输入“Chodovská tvrz”。
在 utf-8 中是：'43 68 6f 64 6f 76 73 6b c3 a1 20 74 76 72 7a'（c3 a1 在 utf-8 中是 'á'）
导入器将这些字节读取为 unicode 字符。
所以你得到的不是'á'，而是两个字符c3 a1，它们是'Ã'和'¡'。

您可以通过将字符串的字符转换为字节数组，然后从中创建一个新字符串来反转它。我确信一定有一个更简单的方法，但这里有一个例子：

public class Convert
{
    public static void main(String... args) throws Exception {
        String in = "Chodovsk\u00C3\u00A1 tvrz";
        char[] chars = in.toCharArray();
        // make a new string by treating chars as bytes
        String out = new String(fix(chars), "utf-8");
        System.err.println("Got: " + out); // Chodovská tvrz
    }

    public static byte[] fix(char[] a) {
        byte[] b = new byte[a.length];
        for (int i = 0; i < a.length; i++) b[i] = (byte) a[i];
        return b;
    }
}

使用它list.get(i).get("churchname").toString()（这是您正在打印的）将修复这些名称。

编辑：

或者只是使用：

String churchname = list.get(i).get("churchname").toString();
String out2 = new String(churchname.getBytes("iso-8859-1"), "utf-8");

这要简单得多。

java - Jena 结果为 UTF-8 格式

1 回答 1

Related

Reference