0

我的文件系统上有文件,在 Windows XP 上。我想使用 Java (JRE 1.6) 解析它们。

问题是,当文件路径中有空格时,我不明白 Java 和 Xerces 如何协同工作。

如果文件的路径中没有空格,则一切正常。

如果有空格,我可能会遇到这种麻烦,即使我使用 FileInputStream 实例调用解析器

java.net.UnknownHostException: .
    at java.net.PlainSocketImpl.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at java.net.Socket.connect(Unknown Source)
    at sun.net.NetworkClient.doConnect(Unknown Source)
    at sun.net.NetworkClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.ftp.FtpClient.openServer(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.connect(Unknown Source)
    at sun.net.www.protocol.ftp.FtpURLConnection.getInputStream(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
    at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)

sun.net.ftp.FtpClient.openServer??? Wtf ?)

否则这种麻烦:

java.net.MalformedURLException: unknown protocol: d
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at java.net.URL.<init>(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.startDTDEntity(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDTDScannerImpl.setInputSource(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.dispatch(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$DTDDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl$PrologDriver.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLNSDocumentScannerImpl.next(Unknown Source)
    at com.sun.org.apache.xerces.internal.impl.XMLDocumentFragmentScannerImpl.scanDocument(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
    at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)

(它说unknown protocol: d是因为,我猜,该文件在 D 驱动器上。)

有没有人知道为什么会发生这种情况,以及如何规避这个问题?我试图提供我自己的 EntityResolver 但我的日志告诉我它在崩溃之前甚至没有被调用。


编辑:

这是调用解析器的代码。

public Document fileToDom(File file) throws ProcessException {
    Document doc = null;
    try {
        DocumentBuilderFactory db = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = db.newDocumentBuilder();
        if (this.errorHandler!=null){
            builder.setErrorHandler(this.errorHandler);}
        else {
            builder.setErrorHandler(new DefaultHandler());
        }
        FileInputStream test= new FileInputStream(file);
        doc = builder.parse(test);
        ...
    } catch (Exception e) {...}
    ...
}

目前,我发现自己被迫在解析之前删除 DOCTYPE,这消除了所有问题,并且 DTD 验证......不是一个很好的解决方案。

4

4 回答 4

2

你只是在使用DocumentBuilder.parse(filename)吗?

如果是这样,那就失败了,因为它需要一个 URI。打开FileInputStream文件,然后将其传递给DocumentBuilder.parse(InputStream).

于 2009-07-15T15:25:27.687 回答
1

It looks like it's trying to connect to a URL in the doctype header so it can download it in order to validate the document against the downloaded DTD.

于 2009-07-27T14:27:53.933 回答
1

试试这个 URI 样式:

file:///d:/folder/folder%20with%20space/file.xml

于 2009-07-15T15:25:19.073 回答
0

尝试这个。

InputSource is = new InputSource();
is.setCharacterStream(new StringReader(test));
doc = builder.parse(is);

而不仅仅是解析“测试”

于 2016-04-20T12:46:23.813 回答