2

在我的程序执行过程中,我看到太多打开的文件异常。通常,这些以以下形式出现:

org.jboss.netty.channel.ChannelException: Failed to create a selector.

...
Caused by: java.io.IOException: Too many open files

然而,这些并不是唯一的例外。我观察到类似的情况(由“打开的文件太多”引起),但频率要低得多

奇怪的是,我将屏幕会话(从我启动程序的位置)的打开文件限制设置为 1M:

root@s11:~/fabiim-cbench# ulimit -a
core file size          (blocks, -c) 0
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 20
file size               (blocks, -f) unlimited
pending signals                 (-i) 16382
max locked memory       (kbytes, -l) 64
max memory size         (kbytes, -m) unlimited
**open files                      (-n) 1000000**
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

此外,正如我看到的输出所观察到的,lsof -p在抛出异常之前,我看不到 1111 个打开的文件(套接字、管道、文件)。

问题:出了什么问题和/或我该如何深入研究这个问题。

额外:我目前正在将Floodlightbft-smart集成。简而言之,Floodlight 进程是在执行由基准程序启动的压力测试时因打开文件异常过多而崩溃的进程。该基准程序将保持 64 个与 Floodlight 进程的 tcp 连接,而 Floodlight 进程又应保持与 bft-smart 副本的至少 64 * 3 个 tcp 连接。这两个程序都使用netty来管理这些连接。

4

1 回答 1

4

首先要检查——你ulimit能从你的 Java 进程内部运行以确保文件限制在内部是相同的吗?像这样的代码应该可以工作:

InputStream is = Runtime.getRuntime().exec(new String[] {"bash", "-c", "ulimit -a"}).getInputStream();
int c;
while ((c = is.read()) != -1) {
    System.out.write(c);
}

如果限制仍然显示 100 万,那么您需要进行一些艰苦的调试。

如果我必须对此进行调试,我会考虑以下几点——</p>

  1. 你的tcp端口号用完了吗?netstat -an当您遇到此错误时会显示什么?

  2. 用于strace准确找出带有哪些参数的系统调用导致引发此错误。EMFILE24的返回值。

  3. “打开的文件太多”EMFILE错误实际上可能由许多不同的系统调用引发,原因有很多:

    $ cd /usr/share/man/man2
    $ zgrep -A 2 EMFILE *
    accept.2.gz:.B EMFILE
    accept.2.gz:The per-process limit of open file descriptors has been reached.
    accept.2.gz:.TP
    accept.2.gz:--
    accept.2.gz:.\" EAGAIN, EBADF, ECONNABORTED, EINTR, EINVAL, EMFILE,
    accept.2.gz:.\" ENFILE, ENOBUFS, ENOMEM, ENOTSOCK, EOPNOTSUPP, EPROTO, EWOULDBLOCK.
    accept.2.gz:.\" In addition, SUSv2 documents EFAULT and ENOSR.
    dup.2.gz:.B EMFILE
    dup.2.gz:The process already has the maximum number of file
    dup.2.gz:descriptors open and tried to open a new one.
    epoll_create.2.gz:.B EMFILE
    epoll_create.2.gz:The per-user limit on the number of epoll instances imposed by
    epoll_create.2.gz:.I /proc/sys/fs/epoll/max_user_instances
    eventfd.2.gz:.B EMFILE
    eventfd.2.gz:The per-process limit on open file descriptors has been reached.
    eventfd.2.gz:.TP
    execve.2.gz:.B EMFILE
    execve.2.gz:The process has the maximum number of files open.
    execve.2.gz:.TP
    execve.2.gz:--
    execve.2.gz:.\" document ETXTBSY, EPERM, EFAULT, ELOOP, EIO, ENFILE, EMFILE, EINVAL,
    execve.2.gz:.\" EISDIR or ELIBBAD error conditions.
    execve.2.gz:.SH NOTES
    fcntl.2.gz:.B EMFILE
    fcntl.2.gz:For
    fcntl.2.gz:.BR F_DUPFD ,
    getrlimit.2.gz:.BR EMFILE .
    getrlimit.2.gz:(Historically, this limit was named
    getrlimit.2.gz:.B RLIMIT_OFILE
    inotify_init.2.gz:.B EMFILE
    inotify_init.2.gz:The user limit on the total number of inotify instances has been reached.
    inotify_init.2.gz:.TP
    mmap.2.gz:.\" SUSv2 documents additional error codes EMFILE and EOVERFLOW.
    mmap.2.gz:.SH AVAILABILITY
    mmap.2.gz:On POSIX systems on which
    mount.2.gz:.B EMFILE
    mount.2.gz:(In case no block device is required:)
    mount.2.gz:Table of dummy devices is full.
    open.2.gz:.B EMFILE
    open.2.gz:The process already has the maximum number of files open.
    open.2.gz:.TP
    pipe.2.gz:.B EMFILE
    pipe.2.gz:Too many file descriptors are in use by the process.
    pipe.2.gz:.TP
    shmop.2.gz:.\" SVr4 documents an additional error condition EMFILE.
    shmop.2.gz:
    shmop.2.gz:In SVID 3 (or perhaps earlier)
    signalfd.2.gz:.B EMFILE
    signalfd.2.gz:The per-process limit of open file descriptors has been reached.
    signalfd.2.gz:.TP
    socket.2.gz:.B EMFILE
    socket.2.gz:Process file table overflow.
    socket.2.gz:.TP
    socketpair.2.gz:.B EMFILE
    socketpair.2.gz:Too many descriptors are in use by this process.
    socketpair.2.gz:.TP
    spu_create.2.gz:.B EMFILE
    spu_create.2.gz:The process has reached its maximum open files limit.
    spu_create.2.gz:.TP
    timerfd_create.2.gz:.B EMFILE
    timerfd_create.2.gz:The per-process limit of open file descriptors has been reached.
    timerfd_create.2.gz:.TP
    truncate.2.gz:.\" error conditions EMFILE, EMULTIHP, ENFILE, ENOLINK.  SVr4 documents for
    truncate.2.gz:.\" .BR ftruncate ()
    truncate.2.gz:.\" an additional EAGAIN error condition.
    

    如果您手动查看所有这些联机帮助页,您可能会发现一些有趣的东西。例如,我觉得有趣epoll_create的是,NIO 通道使用的底层系统调用会返回EMFILE“Too many open files”,如果

    遇到了 /proc/sys/fs/epoll/max_user_instances 对每个用户施加的 epoll 实例数限制。有关详细信息,请参阅 epoll(7)。

    现在该文件名实际上并不存在于我的系统上,但是在文件中定义了一些限制/proc/sys/fs/epoll/proc/sys/fs/inotify您可能会遇到这些限制,特别是如果您在同一台机器上运行同一测试的多个实例。弄清楚是否是这种情况本身就是一件苦差事——您可以从检查 syslog 是否有任何消息开始……</p>

祝你好运!

于 2014-03-12T16:17:48.580 回答