我正在使用 Java 从 URL 读取 RSS 提要,使用 解析 DOM 树javax.xml.parsers.DocumentBuilder.parse(InputStream)
,对其进行一些更改,然后使用org.w3c.dom.ls.LSSerializer.write(Node,LSOutput)
.
我正在阅读的提要是http://www.collaborationblueprint.com.au/blog/rss.xml。
提要是格式正确的 XML,但序列化结果不是。
到目前为止,在每次尝试中,CData 部分都因删除一对方括号而被破坏。
例如,如果源包含以下元素:
<description><![CDATA[<p>一些文本</p>]]></description>
序列化的结果如下所示,并且格式不正确:
<description><![CDATA<p>一些文本</p>]></description>
我的代码如下。它包含在 Lotus Domino 代理中。
我该如何解决这个问题?
import java.io.InputStream;
import java.io.PrintWriter;
import java.net.HttpURLConnection;
import java.net.URL;
import java.net.URLDecoder;
import java.util.HashMap;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import org.w3c.dom.bootstrap.DOMImplementationRegistry;
import org.w3c.dom.Element;
import org.w3c.dom.Node;
import org.w3c.dom.ls.DOMImplementationLS;
import org.w3c.dom.ls.LSOutput;
import org.w3c.dom.ls.LSSerializer;
import lotus.domino.*;
public class JavaAgent extends AgentBase {
public void NotesMain() {
try {
org.w3c.dom.Document newDoc;
DocumentBuilderFactory builderFactory;
DocumentBuilder builder;
Element docElem,tmpElem;
Node tmpNode;
Session session=getSession();
AgentContext agentContext=session.getAgentContext();
// Put URL arguments into a HashMap.
Document doc=agentContext.getDocumentContext();
String[] query=doc.getItemValueString("Query_String").split("&");
HashMap<String,String> queryMap=new HashMap<String,String>(query.length);
for (int i=0; i<query.length; i++) {
int j=query[i].indexOf('=');
if (j<0) queryMap.put(query[i],"");
else queryMap.put(query[i].substring(0,j),URLDecoder.decode(query[i].substring(j+1),"UTF-8"));
}
// Get the "src" URL argument - this is the URL we're reading the feed from.
String urlStr=queryMap.get("src");
if (urlStr==null || urlStr.length()==0) {
System.err.println("Error: source URL not specified.");
return;
}
URL url;
try {
url=new URL(urlStr);
} catch (Exception e) {
System.err.println("Error: invalid source URL.");
return;
}
HttpURLConnection conn=(HttpURLConnection)url.openConnection();
InputStream is=conn.getInputStream();
builderFactory=DocumentBuilderFactory.newInstance();
builder=builderFactory.newDocumentBuilder();
// Create a DocumentBuilder and parse the XML.
builder=builderFactory.newDocumentBuilder();
try {
newDoc=builder.parse(is);
is.close();
conn.disconnect();
} catch (Exception e) {
is.close();
conn.disconnect();
System.err.println("XML parse exception: "+e.toString());
return;
}
docElem=newDoc.getDocumentElement();
docElem.setAttribute("xmlns:ibmwcm","http://purl.org/net/ibmfeedsvc/wcm/1.0");
PrintWriter pw=getAgentOutput();
pw.println("Content-type: text/xml");
DOMImplementationRegistry registry=DOMImplementationRegistry
.newInstance();
DOMImplementationLS impl=(DOMImplementationLS)registry
.getDOMImplementation("LS");
LSOutput lso=impl.createLSOutput();
lso.setCharacterStream(pw);
LSSerializer writer=impl.createLSSerializer();
writer.write(newDoc,lso);
} catch (Exception e) {
e.printStackTrace();
}
}
}