我将 HTML 页面中的西里尔文内容提取到文本文件中。西里尔字母在这个文件中是可以的。然后我使用这个文件使用 Jena 创建一个 RDF 文件。这是我的代码:
private void createRDFFile(String webContentFilePath) throws IOException {
// TODO Auto-generated method stub
Model model = ModelFactory.createDefaultModel();
RDFWriter writer = model.getWriter("RDF/XML");
writer.setProperty("showXmlDeclaration", "true");
writer.setProperty("showDoctypeDeclaration", "true");
writer.setProperty("tab", "8");
Writer out = new BufferedWriter(new OutputStreamWriter(
new FileOutputStream(rdfFilePath), "UTF8"));
Resource resDest = null;
Property hasTimeStart = model.createProperty(ns + "#hasTimeStart");
Property distrName = model.createProperty(ns + "#distrName");
Property moneyOneDir = model.createProperty(ns + "#moneyOneDir");
Property moneyTwoDir = model.createProperty(ns + "#moneyTwoDir");
Property hasTimeStop = model.createProperty(ns + "#hasTimeStop");
BufferedReader br = new BufferedReader(new FileReader(
webContentFilePath));
String line = "";
while ((line = br.readLine()) != null) {
String[] arrayLine = line.split("\\|");
resDest = model.createResource(ns + arrayLine[5]);
resDest.addProperty(hasTimeStart, arrayLine[0]);
resDest.addProperty(distrName, arrayLine[1]);
resDest.addProperty(moneyOneDir, arrayLine[2]);
resDest.addProperty(moneyTwoDir, arrayLine[3]);
resDest.addProperty(hasTimeStop, arrayLine[4]);
}
br.close();
model.write(System.out, "RDF/XML");
writer.write(model, out, null);
}
当我打开 RDF 文件时,西里尔字母就像 РўР РђРќРЎРљРћРџ-Р'Р?ТОЛА。有人可以帮我吗?