python-3.x - Python 3.5.x 多处理在 74% 可用 RAM 系统上抛出“OSError：[Errno 12] 无法分配内存”

Question

multiprocessing我编写了一个 Python 3.5 应用程序，它使用库生成多个进程。它在配备 2 个 Intel Xeon E2690 v1（每个 8 核）和 96GB RAM 的专用服务器机器上运行在 Ubuntu 16.04.3 LTS 上。

该系统运行一个 PostgreSQL 实例，该实例配置为使用最大约 32GB 的 RAM（effective_cache_size设置为32GB），但这只是出于我的问题的目的（我尝试了、、等的几种effective_cache_size组合work_mem）shared_buffers。

每个进程都会打开一个到数据库的连接并多次重复使用它。

这是代码的简化部分，显示了我如何生成一个新进程：

from multiprocessing import Process
import time

process = Process(target=start_algorithm, args=(arg1, arg2))
process.start()

def start_algorithm():
    while True :
        time.sleep(1)
    return True

在生成超过 200 个进程后（确切的数量并不总是相同），应用程序在尝试生成新进程时会抛出异常：

OSError: [Errno 12] Cannot allocate memory

ulimit -a输出是：

$ ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 384500
max locked memory       (kbytes, -l) unlimited
max memory size         (kbytes, -m) unlimited
open files                      (-n) 524288
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) 384500
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

在/etc/sysctl.conf我设置了以下适用于mdadmPostgreSQL 的参数：

# Allows for 84GB shared_buffers in PostgreSQL
kernel.shmmax = 90914313216
kernel.shmall = 22020096

# Various PostgreSQL optimizations
vm.overcommit_memory = 2
vm.overcommit_ratio = 90
vm.swappiness = 4
vm.zone_reclaim_mode = 0
vm.dirty_ratio = 15
vm.dirty_background_ratio = 3

# mdadm optimizations
vm.min_free_kbytes=262144
kernel.sched_migration_cost_ns = 5000000
kernel.sched_autogroup_enabled = 0
dev.raid.speed_limit_max=1000000
dev.raid.speed_limit_min=1000000

我也尝试设置和取消设置，vm.nr_hugepages但它并没有解决问题。

在启动我的 Python 应用程序之前，RAM 使用量大约是500 MBover 96 GB，所以我可以看到整个 RAM 是相当空的。在产生这 200 多个进程之后，RAM 开始填充并在大约20 GB（剩余74 GB仍然空闲）时达到最大值，然后抛出Cannot allocate memory异常。

问题是：为什么？

我试图测量整个流程的足迹，并找到memory_profiler了一个 Python 库/工具。我能够得到这张图：

如果我没记错的话，这些都是关于47500 MiB内存的，所以关于50 GB“占用”的 RAM。每个进程的footprintf 应约为170 MB. 问题是我无法在任何地方看到占用的 RAM 量。以下是一些输出：

$ free -h
              total        used        free      shared  buff/cache   available
Mem:            94G         18G         74G        570M        1,6G         73G
Swap:           15G          0B         15G

$ vmstat -S M
procs -----------memory---------- ---swap-- -----io---- -system-- ------cpu-----
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  0      0  75879     81   1518    0    0     3    40  229  157 12  1 86  0  0

$ top
Tasks: 1457 total,   2 running, 1455 sleeping,   0 stopped,   0 zombie
%Cpu(s):  3,1 us,  0,9 sy,  0,0 ni, 95,8 id,  0,0 wa,  0,0 hi,  0,2 si,  0,0 st
KiB Mem : 98847552 total, 77698752 free, 19509612 used,  1639188 buff/cache
KiB Swap: 15825916 total, 15825916 free,        0 used. 77580104 avail Mem

通过降低 PostgreSQL 所需的内存量，我能够启动 287 个进程，但这总是会导致大量可用 RAM ( 74GB)。这是我的 PostgreSQL 9.6 ( ) 的配置文件postgresql.conf：

max_connections=2000
listen_addresses = '127.0.0.1,192.168.2.90'
shared_buffers = 1GB
work_mem = 42MB
port=5433
maintenance_work_mem = 256MB
checkpoint_completion_target = 0.9
effective_cache_size = 32GB
default_statistics_target = 1000
random_page_cost=1.2
seq_page_cost=1.0
max_files_per_process = 500 # default 1000
huge_pages = off

编辑

我在 SO 上找到了这个答案，并找到了一种直接测量整体内存使用情况的方法。

Python（产生了 288 个进程）：

$ ps aux | grep python3 | awk '{sum=sum+$6}; END {print sum/1024 " MB"}'
53488.1 MB

PostgreSQL：

$ ps aux | grep postgres | awk '{sum=sum+$6}; END {print sum/1024 " MB"}'
20653.4 MB

我仍然不明白为什么常用工具（vmstat, free, top, glances）会显示另一数量的已用 RAM。

python-3.x - Python 3.5.x 多处理在 74% 可用 RAM 系统上抛出“OSError：[Errno 12] 无法分配内存”

编辑

0 回答 0

Related

Reference