在将文本文件转换为 xml 文件的过程中,是否有任何建议或任何帮助我可以建议我,以便在文本文件中对简单文本进行分段,就像以前在 xml 中那样。我的意思是,我正在使用 jaxp+sax 将文本文件转换为 xml,如下所示:
Hello world. I am happy to see you today.
进入这个 xml:
<trans-unit id="1">
<target> Hello world</target>
</trans-unit>
<trans-unit id="2">
<target> I am happy to see you today</target>
</trans-unit>
但是,例如,如果我有源 xml 内容,在id="1"中有 3 个句子,例如:
<trans-unit id="1">
<source> Hello world. Sunny smile. Wake up early.</source>
</trans-unit>
<trans-unit id="2">
<source> I am happy to see you today</source>
</trans-unit>
然后我从这个 xml 解析文本我变成了简单的文本:
Hello world. Sunny smile. Wake up early.I am happy to see you today.
在将其转换为 xml 期间,我如何分段此文本,以便目标 xml 文件也可以再次包含 3 个句子?喜欢:
<trans-unit id="1">
<target> Hello world. Sunny smile. Wake up early.</target>
</trans-unit>
<trans-unit id="2">
<target> I am happy to see you today</target>
</trans-unit>
即转换txt->xml:
public void doit() {
try {
in = new BufferedReader(new InputStreamReader(
new FileInputStream(file), "UTF8"));
out = new StreamResult(selectedDir);
initXML();
String str;
while ((str = in.readLine()) != null) {
elements = str.split("\n|((?<!\\d)\\.(?!\\d))");
for (i = 0; i < elements.length; i++)
process(str);
}
in.close();
closeXML();
} catch (Exception e) {
e.printStackTrace();
}
}
public void initXML() throws ParserConfigurationException,SAXException, UnsupportedEncodingException, FileNotFoundException, TransformerException {
// JAXP + SAX
SAXTransformerFactory tf = (SAXTransformerFactory) SAXTransformerFactory.newInstance();
th = tf.newTransformerHandler();
Transformer serializer = th.getTransformer();
serializer.setOutputProperty(OutputKeys.ENCODING, "UTF-8");
// XML ausgabe
serializer.setOutputProperty("{http://xml.apache.org/xslt}indent-amount", "4");
serializer.setOutputProperty(OutputKeys.INDENT, "yes");
th.setResult(out);
th.startDocument();
atts = new AttributesImpl();
atts1 = new AttributesImpl();
atts1.addAttribute("", "", "xlmns","CDATA", "urn:oasis:names:tc:xliff:document:1.2");
th.startElement("", "", "xliff", atts1);
th.startElement("", "", "file",null);
th.startElement("", "", "body", null);
}
public void process(String s) throws SAXException {
try {
atts.clear();
k++;
atts.addAttribute("", "", "id", "", "" + k);
th.startElement("", "", "trans-unit", atts);
th.startElement("", "", "target", null);
th.characters(elements[i].toCharArray(), 0, elements[i].length());
th.endElement("", "", "target");
th.endElement("", "", "trans-unit");
}
catch (Exception e) {
System.out.print("Out of bounds!");
}
}
public void closeXML() throws SAXException {
th.endElement("", "", "body");
th.endElement("", "", "file");
th.endElement("", "", "xliff");
th.endDocument();
}