8

我正在阅读 Jesse Storimer 的优秀书籍Working with Unix Processes。在关于从已退出的子进程捕获信号的部分中,他给出了一个代码示例。

我已经稍微修改了该代码(见下文),以便对正在发生的事情有更多的了解:

  • 父母在信号之间恢复自己的执行(我可以用它看到puts),
  • wait一个语句中为多个孩子执行trap(有时我得到“收到一个 CHLD 信号”,然后是多个“孩子 pid 退出”)。

预期产出

通常,下面代码的输出类似于:

parent is working hard
Received a CHLD signal
child pid 73408 exited
parent is working hard
parent is working hard
parent is working hard
Received a CHLD signal
child pid 73410 exited
child pid 73409 exited
All children exited - parent exiting too.

偶尔的错误

但有时我会收到这样的错误:

trapping_signals.rb:17:in `write': deadlock; recursive locking (ThreadError)
    from trapping_signals.rb:17:in `puts'
    from trapping_signals.rb:17:in `puts'
    from trapping_signals.rb:17:in `block in <main>'
    from trapping_signals.rb:17:in `call'
    from trapping_signals.rb:17:in `write'
    from trapping_signals.rb:17:in `puts'
    from trapping_signals.rb:17:in `puts'
    from trapping_signals.rb:17:in `block in <main>'
    from trapping_signals.rb:40:in `call'
    from trapping_signals.rb:40:in `sleep'
    from trapping_signals.rb:40:in `block in <main>'
    from trapping_signals.rb:38:in `loop'
    from trapping_signals.rb:38:in `<main>

谁能向我解释这里出了什么问题?

编码

child_processes = 3
dead_processes = 0

# We fork 3 child processes.
child_processes.times do
  fork do
    # Each sleeps between 0 and 5 seconds
    sleep rand(5)
  end
end

# Our parent process will be busy doing some work.
# But still wants to know when one of its children exits.

# By trapping the :CHLD signal our process will be notified by the kernel
# when one of its children exits.
trap(:CHLD) do
  puts "Received a CHLD signal"
  # Since Process.wait queues up any data that it has for us we can ask for it
  # here, since we know that one of our child processes has exited.

  # We loop over a non-blocking Process.wait to ensure that any dead child
  # processes are accounted for.
  # Here we wait without blocking.
  while pid = Process.wait(-1, Process::WNOHANG)
    puts "child pid #{pid} exited"
    dead_processes += 1

    # We exit ourselves once all the child processes are accounted for.
    if dead_processes == child_processes
      puts "All children exited - parent exiting too."
      exit
    end
  end
end

# Work it.
loop do
  puts "parent is working hard"
  sleep 1
end
4

1 回答 1

14

我查看了Ruby 源代码以查看该特定错误是在哪里引发的,并且它仅在当前线程尝试获取锁时引发,但当前线程已经获取了相同的锁。这意味着锁定是不可重入的:

m = Mutex.new
m.lock
m.lock #=> same error as yours

现在至少我们知道会发生什么,但还不知道为什么和在哪里。错误消息表明它发生在调用puts. 当它被调用时,它最终以io_binwrite结束。stdout不是同步的,但它是缓冲的,因此第一次调用时满足if 条件,并且将为该缓冲区设置一个缓冲区和一个写锁。写锁对于保证写入标准输出的原子性很重要,不应该发生两个线程同时写入以stdout混淆彼此的输出。为了证明我的意思:

t1 = Thread.new { 100.times { print "aaaaa" } }
t2 = Thread.new { 100.times { print "bbbbb" } }
t1.join
t2.join

尽管两个线程轮流写入stdout,但绝不会发生单个写入中断的情况 - 您将始终按顺序拥有完整的 5 个 a 或 b。这就是写锁的用途

现在,在您的情况下出现问题的是该写锁的竞争条件。父进程循环并每秒写入stdout一次(“父进程正在努力工作”)。但是同一个线程最终也会执行该trap块并再次尝试写入stdout(“接收到 CHLD 信号”)。#{Thread.current}您可以通过添加puts语句来验证它是否真的是同一个线程。如果这两个事件发生得足够近,那么您将遇到与第一个示例相同的情况:同一个线程试图两次获得相同的锁,这最终会触发错误。

于 2012-05-18T22:30:30.833 回答