12

I am reading a very large file and extracting some small portions of text from each line. However at the end of the operation, I am left with very little memory to work with. It seems that the garbage collector fails to free memory after reading in the file.

My question is: Is there any way to free this memory? Or is this a JVM bug?

I created an SSCCE to demonstrate this. It reads in a 1 mb (2 mb in Java due to 16 bit encoding) file and extracts one character from each line (~4000 lines, so should be about 8 kb). At the end of the test, the full 2 mb is still used!

The initial memory usage:

Allocated: 93847.55 kb
Free: 93357.23 kb

Immediately after reading in the file (before any manual garbage collection):

Allocated: 93847.55 kb
Free: 77613.45 kb (~16mb used)

This is to be expected since the program is using a lot of resources to read in the file.

However then I garbage collect, but not all the memory is freed:

Allocated: 93847.55 kb
Free: 91214.78 kb (~2 mb used! That's the entire file!)

I know that manually calling the garbage collector doesn't give you any guarantees (in some cases it is lazy). However this was happening in my larger application where the file eats up almost all available memory, and causes the rest of the program to run out of memory despite the need for it. This example confirms my suspicion that the excess data read from the file is not freed.

Here is the SSCCE to generate the test:

import java.io.*;
import java.util.*;

public class Test {
    public static void main(String[] args) throws Throwable {
        Runtime rt = Runtime.getRuntime();

        double alloc = rt.totalMemory()/1000.0;
        double free = rt.freeMemory()/1000.0;

        System.out.printf("Allocated: %.2f kb\nFree: %.2f kb\n\n",alloc,free);

        Scanner in = new Scanner(new File("my_file.txt"));
        ArrayList<String> al = new ArrayList<String>();

        while(in.hasNextLine()) {
            String s = in.nextLine();
            al.add(s.substring(0,1)); // extracts first 1 character
        }

        alloc = rt.totalMemory()/1000.0;
        free = rt.freeMemory()/1000.0;
        System.out.printf("Allocated: %.2f kb\nFree: %.2f kb\n\n",alloc,free);

        in.close();
        System.gc();

        alloc = rt.totalMemory()/1000.0;
        free = rt.freeMemory()/1000.0;
        System.out.printf("Allocated: %.2f kb\nFree: %.2f kb\n\n",alloc,free);
    }
}
4

3 回答 3

22

制作子字符串时,您的子字符串会保留对原始字符串的 char 数组的引用(这种优化使得处理字符串的许多子字符串非常快)。因此,当您将子字符串保存在al列表中时,您会将整个文件保存在内存中。为避免这种情况,请使用将字符串作为参数的构造函数创建一个新字符串。

所以基本上我建议你这样做

    while(in.hasNextLine()) {
        String s = in.nextLine();
        al.add(new String(s.substring(0,1))); // extracts first 1 character
    }

String(String) 构造函数的源代码明确指出它的用途是修剪“包袱”:

  164       public String(String original) {
  165           int size = original.count;
  166           char[] originalValue = original.value;
  167           char[] v;
  168           if (originalValue.length > size) {
  169               // The array representing the String is bigger than the new
  170               // String itself.  Perhaps this constructor is being called
  171               // in order to trim the baggage, so make a copy of the array.
  172               int off = original.offset;
  173               v = Arrays.copyOfRange(originalValue, off, off+size);
  174           } else {
  175               // The array representing the String is the same
  176               // size as the String, so no point in making a copy.
  177               v = originalValue;
  178           }
  179           this.offset = 0;
  180           this.count = size;
  181           this.value = v;

更新: OpenJDK 7 更新 6 解决了这个问题。使用更新版本的人没有这个问题。

于 2012-06-08T15:35:47.887 回答
6

确保不要保留您不再需要的参考资料。

您仍然有对aland的引用in

al = null; in = null;在调用垃圾收集器之前尝试添加。

此外,您需要了解如何substring实施。substring保留原始字符串,并且对同一char[]数组使用不同的偏移量和长度。

al.add(new String(s.substring(0,1)));

不确定是否有更优雅的复制子字符串的方法。也许s.getChars()对你也更有用。

从 Java 8 开始, substring现在复制字符。您可以自己验证构造函数是否调用了Arrays.copyOfRange.

于 2012-06-08T15:38:20.123 回答
-1

System.gc() 并不能保证 JVM 会进行垃圾收集 - 它只是向 JVM 提出建议,它可以尝试进行垃圾收集。由于已经有大量可用内存,JVM 可能会忽略该建议并继续运行,直到它觉得有必要这样做。

在文档http://docs.oracle.com/javase/6/docs/api/java/lang/System.html#gc()中阅读更多信息

另一个关于它的问题可以在System.gc() 什么时候做

于 2012-06-08T15:38:19.233 回答