6

我正在尝试使用声明为使用过渡 dtd 的 doctype 解析 HTML 文档,如下所示:

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" " http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd ">

当我对文档执行 Builder.build 时,出现以下异常:

  java.io.IOException: Server returned HTTP response code: 503 for URL: http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd
       at sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1305)
       at org.apache.xerces.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
       at org.apache.xerces.impl.XMLEntityManager.startEntity(Unknown Source)
       at org.apache.xerces.impl.XMLEntityManager.startDTDEntity(Unknown Source)
       at org.apache.xerces.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
       at org.apache.xerces.impl.XMLDocumentScannerImpl$DTDDispatcher.dispatch(Unknown Source)
       at org.apache.xerces.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
       at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
       at org.apache.xerces.parsers.DTDConfiguration.parse(Unknown Source)
       at org.apache.xerces.parsers.XMLParser.parse(Unknown Source)
       at org.apache.xerces.parsers.AbstractSAXParser.parse(Unknown Source)
       at nu.xom.Builder.build(Builder.java:1127)
       at nu.xom.Builder.build(Builder.java:1019)

如果我删除文档类型声明,它解析就好了。我可以从浏览器成功下载 dtd,它告诉我 url 是有效的。我不想删除文档类型声明。有没有办法告诉构建者不要下载 dtd 或为其提供备用 dtd?

4

2 回答 2

7

这解决了问题:

DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            factory.setValidating(false);
            factory.setFeature("http://apache.org/xml/features/nonvalidating/load-external-dtd", false);
            Document document = factory.newDocumentBuilder().parse(is);
于 2010-01-26T14:03:28.693 回答
4

快速查看Builder的 javadoc,我想您可以通过采用XMLReader的构造函数提供EntityResolver。我会尽可能避免让解析器从 Internet 下载文件。

于 2009-06-15T22:32:51.830 回答