java - Java - 解决大于内存限制的问题

Question

我最近在思考以下场景：假设您有一个巨大的数据库，并且您希望在加载其中的某些部分时执行一些计算。可能的情况是，即使是该数据库的一小部分也可能无法放入非常有限的 Java 堆内存中。人们如何解决这些障碍？google 是如何在内存空间有限的情况下对 TB 级数据进行分析的？

提前感谢您的回复。

score 11 · Accepted Answer

简短的回答是，您需要以适合内存的块的形式处理数据，然后将这些分块计算的结果组装成最终答案（可能分多个阶段）。一个常见的分布式范例是 Map Reduce：有关 Google 原始实现的详细信息，请参见此处，开源实现请参见Hadoop 。

score 1 · Accepted Answer

I use a 64-bit JVM with off heap memory such as direct ByteBuffers and memory mapped files. This way you can have into the TBs of virtual memory while the heap is 1 GB or less. I have run different applications where the JVM has a virtual memory size 10x larger than physical memory with a modest loss of performance. If you can use a fast SSD this can help you when your working dataset is larger than your main memory.

score 0 · Accepted Answer

1) 增加物理内存和/或虚拟内存大小

2）使用具有分片或类似技术的多台计算机

3) 以适合内存的小块处理数据

4）使用更智能的数据结构选择，使用更少的内存，如布隆过滤器或尝试，如果合适的话。

5）您甚至可以使用压缩算法压缩/解压缩内存中的数据。

score 0 · Accepted Answer

您要么必须获得更多内存并增加堆大小，要么如果无法做到这一点，请编写一次只加载子集或数据的算法。

java - Java - 解决大于内存限制的问题

4 回答 4

Related

Reference