java - Java - 确定 xml 文档的大小

Question

我有一个从给定 URL 获取 xml 文件的简单代码：

DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(link);

该代码返回 xml 文档 (org.w3c.dom.Document)。我只需要获取生成的 xml 文档的大小。有什么优雅的方法可以做到，而不涉及第三方罐子？

PS 大小以 KB 或 MB 为单位，而不是节点数

score 3 · Accepted Answer

第一个天真的版本：将文件加载到本地缓冲区中。然后你知道你的输入有多长。然后将 XML 解析出缓冲区：

URL url = new URL("...");
InputStream in = new BufferedInputStream(url.openStream());
ByteArrayOutputStream buffer1 = new ByteArrayOutputStream();
int c = 0;
while((c = in.read()) >= 0) {
  buffer1.write(c);
}

System.out.println(String.format("Length in Bytes: %d", 
    buffer1.toByteArray().length));

ByteArrayInputStream buffer2 = new ByteArrayInputStream(buffer1.toByteArray());

Document doc = DocumentBuilderFactory.newInstance()
    .newDocumentBuilder().parse(buffer2);

缺点是 RAM 中的额外缓冲区。

第二个更优雅的版本：java.io.FilterInputStream使用自定义计算通过它的字节流来包装输入流：

URL url = new URL("...");
CountInputStream in = new CountInputStream(url.openStream());
Document doc = DocumentBuilderFactory.newInstance().newDocumentBuilder().parse(in);
System.out.println(String.format("Bytes: %d", in.getCount()));

这里是CountInputStream. 所有read()方法都被覆盖以委托给超类并计算结果字节数：

public class CountInputStream extends FilterInputStream {

  private long count = 0L;

  public CountInputStream(InputStream in) {
    super(in);
  }

  public int read() throws IOException {
    final int c = super.read();
    if(c >= 0) {
      count++;
    }
    return c;
  }

  public int read(byte[] b, int off, int len) throws IOException {
    final int bytesRead = super.read(b, off, len);
    if(bytesRead > 0) {
      count += bytesRead;
    }
    return bytesRead;
  }

  public int read(byte[] b) throws IOException {
    final int bytesRead = super.read(b);
    if(bytesRead > 0) {
      count += bytesRead;
    }
    return bytesRead;
  }

  public long getCount() {
    return count;
  }
}

score 0 · Accepted Answer

0

也许这个：

document.getTextContent().getBytes().length;

于 2012-07-05T11:56:56.423 回答

score 0 · Accepted Answer

你可以这样做：

long start = Runtime.getRuntime().freeMemory();

构造您的 XML 文档对象。然后再次调用上述方法。

Document ocument = parser.getDocument();

long now = Runtime.getRuntime().freeMemory();

System.out.println(" size of Document "+(now - start) );

score 0 · Accepted Answer

一旦您将 XML 文件解析为 DOM 树，源文档（作为字符串）就不再存在了。您只有从该文档构建的节点树 - 因此不再可能从 DOM 文档中准确确定源文档的大小。

您可以使用身份转换将 DOM 文档转换回 XML 文件；但这是获取大小的一种非常迂回的方法，它仍然不能与源文档大小完全匹配。

对于您要执行的操作，最好的方法是自己下载文档，记下大小，然后DocumentBuilder.parse使用InputStream.

java - Java - 确定 xml 文档的大小

4 回答 4

Related

Reference