I have a computation inside ST which allocates memory through a Data.Vector.Unboxed.Mutable. The vector is never read or written, nor is any reference retained to it outside of runST (to the best of my knowledge). The problem I have is that when I run my ST computation multiple times, I sometimes seem to keep the memory for the vector around.
Allocation statistics:
5,435,386,768 bytes allocated in the heap
5,313,968 bytes copied during GC
134,364,780 bytes maximum residency (14 sample(s))
3,160,340 bytes maximum slop
518 MB total memory in use (0 MB lost due to fragmentation)
Here I call runST 20x with different values for my computation and a 128MB vector (again - unused, not returned or referenced outside of ST). The maximum residency looks good, basically just my vector plus a few MB of other stuff. But the total memory use indicates that I have four copies of the vector active at the same time. This scales perfectly with the size of the vector, for 256MB we get 1030MB as expected.
Using a 1GB vector runs out of memory (4x1GB + overhead > 32bit). I don't understand why the RTS keeps seemingly unused, unreferenced memory around instead of just GC'ing it, at least at the point where an allocation would otherwise fail.
Running with +RTS -S reveals the following:
Alloc Copied Live GC GC TOT TOT Page Flts
bytes bytes bytes user elap user elap
134940616 13056 134353540 0.00 0.00 0.09 0.19 0 0 (Gen: 1)
583416 6756 134347504 0.00 0.00 0.09 0.19 0 0 (Gen: 0)
518020 17396 134349640 0.00 0.00 0.09 0.19 0 0 (Gen: 1)
521104 13032 134359988 0.00 0.00 0.09 0.19 0 0 (Gen: 0)
520972 1344 134360752 0.00 0.00 0.09 0.19 0 0 (Gen: 0)
521100 828 134360684 0.00 0.00 0.10 0.19 0 0 (Gen: 0)
520812 592 134360528 0.00 0.00 0.10 0.19 0 0 (Gen: 0)
520936 1344 134361324 0.00 0.00 0.10 0.19 0 0 (Gen: 0)
520788 1480 134361476 0.00 0.00 0.10 0.20 0 0 (Gen: 0)
134438548 5964 268673908 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
586300 3084 268667168 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
517840 952 268666340 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
520920 544 268666164 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
520780 428 268666048 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
520820 2908 268668524 0.00 0.00 0.19 0.38 0 0 (Gen: 0)
520732 1788 268668636 0.00 0.00 0.19 0.39 0 0 (Gen: 0)
521076 564 268668492 0.00 0.00 0.19 0.39 0 0 (Gen: 0)
520532 712 268668640 0.00 0.00 0.19 0.39 0 0 (Gen: 0)
520764 956 268668884 0.00 0.00 0.19 0.39 0 0 (Gen: 0)
520816 420 268668348 0.00 0.00 0.20 0.39 0 0 (Gen: 0)
520948 1332 268669260 0.00 0.00 0.20 0.39 0 0 (Gen: 0)
520784 616 268668544 0.00 0.00 0.20 0.39 0 0 (Gen: 0)
521416 836 268668764 0.00 0.00 0.20 0.39 0 0 (Gen: 0)
520488 1240 268669168 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
520824 1608 268669536 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
520688 1276 268669204 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
520252 1332 268669260 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
520672 1000 268668928 0.00 0.00 0.20 0.40 0 0 (Gen: 0)
134553500 5640 402973292 0.00 0.00 0.29 0.58 0 0 (Gen: 0)
586776 2644 402966160 0.00 0.00 0.29 0.58 0 0 (Gen: 0)
518064 26784 134342772 0.00 0.00 0.29 0.58 0 0 (Gen: 1)
520828 3120 134343528 0.00 0.00 0.29 0.59 0 0 (Gen: 0)
521108 756 134342668 0.00 0.00 0.30 0.59 0 0 (Gen: 0)
Here it seems we have 'live bytes' exceeding ~128MB.
The +RTS -hy
profile basically just says we allocate 128MB:
http://imageshack.us/a/img69/7765/45q8.png
I tried reproducing this behavior in a simpler program, but even with replicating the exact setup with ST, a Reader containing the Vector, same monad/program structure etc. the simple test program doesn't show this. Simplifying my big program the behavior also stops eventually when removing apparently completely unrelated code.
Qs:
- Am I really keeping this vector around 4 times out of 20?
- If yes, how do I actually tell since
+RTS -Hy
andmaximum residency
claim I'm not, and what can I do to stop this behavior? - If no, why is Haskell not GC'ing it and running out of address space / memory, and what can I do to stop this behavior?
Thanks!