13

当我们从 Python 2.7.3 升级到 Python 2.7.5 时,一个大量使用 subprocess.Popen() 的内部库开始使其自动化测试失败。该库用于线程环境。调试问题后,我能够创建一个简短的 Python 脚本来演示失败测试中的错误。

这是脚本(称为“threadedsubprocess.py”):

import time
import threading
import subprocess

def subprocesscall():
    p = subprocess.Popen(
        ['ls', '-l'],
        stdin=subprocess.PIPE,
        stdout=subprocess.PIPE,
        stderr=subprocess.PIPE,
        )
    time.sleep(2) # simulate the Popen call takes some time to complete.
    out, err = p.communicate()
    print 'succeeding command in thread:', threading.current_thread().ident

def failingsubprocesscall():
    try:
        p = subprocess.Popen(
            ['thiscommandsurelydoesnotexist'],
            stdin=subprocess.PIPE,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
            )
    except Exception as e:
        print 'failing command:', e, 'in thread:', threading.current_thread().ident

print 'main thread is:', threading.current_thread().ident

subprocesscall_thread = threading.Thread(target=subprocesscall)
subprocesscall_thread.start()
failingsubprocesscall()
subprocesscall_thread.join()

注意:从 Python 2.7.3 运行时,此脚本不会以 IOError 退出。从 Python 2.7.5(都在同一个 Ubuntu 12.04 64 位 VM 上)运行时,它确实失败了至少 50% 的时间。

Python 2.7.5 上引发的错误是这样的:

/opt/python/2.7.5/bin/python ./threadedsubprocess.py 
main thread is: 139899583563520
failing command: [Errno 2] No such file or directory 139899583563520
Exception in thread Thread-1:
Traceback (most recent call last):
  File "/opt/python/2.7.5/lib/python2.7/threading.py", line 808, in __bootstrap_inner
    self.run()
  File "/opt/python/2.7.5/lib/python2.7/threading.py", line 761, in run
    self.__target(*self.__args, **self.__kwargs)
  File "./threadedsubprocess.py", line 13, in subprocesscall
    out, err = p.communicate()
  File "/opt/python/2.7.5/lib/python2.7/subprocess.py", line 806, in communicate
    return self._communicate(input)
  File "/opt/python/2.7.5/lib/python2.7/subprocess.py", line 1379, in _communicate
    self.stdin.close()
IOError: [Errno 9] Bad file descriptor

close failed in file object destructor:
IOError: [Errno 9] Bad file descriptor

当比较从 Python 2.7.3 到 Python 2.7.5 的子进程模块时,我看到 Popen() 的 __init__() 调用现在确实显式地关闭了标准输入、标准输出和标准错误文件描述符,以防执行命令以某种方式失败。这似乎是在 Python 2.7.4 中应用的预期修复程序,以防止泄漏文件描述符(http://hg.python.org/cpython/file/ab05e7dd2788/Misc/NEWS#l629)。

似乎与此问题相关的 Python 2.7.3 和 Python 2.7.5 之间的差异在 Popen __init__() 中:

@@ -671,12 +702,33 @@
          c2pread, c2pwrite,
          errread, errwrite) = self._get_handles(stdin, stdout, stderr)

-        self._execute_child(args, executable, preexec_fn, close_fds,
-                            cwd, env, universal_newlines,
-                            startupinfo, creationflags, shell,
-                            p2cread, p2cwrite,
-                            c2pread, c2pwrite,
-                            errread, errwrite)
+        try:
+            self._execute_child(args, executable, preexec_fn, close_fds,
+                                cwd, env, universal_newlines,
+                                startupinfo, creationflags, shell,
+                                p2cread, p2cwrite,
+                                c2pread, c2pwrite,
+                                errread, errwrite)
+        except Exception:
+            # Preserve original exception in case os.close raises.
+            exc_type, exc_value, exc_trace = sys.exc_info()
+
+            to_close = []
+            # Only close the pipes we created.
+            if stdin == PIPE:
+                to_close.extend((p2cread, p2cwrite))
+            if stdout == PIPE:
+                to_close.extend((c2pread, c2pwrite))
+            if stderr == PIPE:
+                to_close.extend((errread, errwrite))
+
+            for fd in to_close:
+                try:
+                    os.close(fd)
+                except EnvironmentError:
+                    pass
+
+            raise exc_type, exc_value, exc_trace

我想我有三个问题:

1) 在线程环境中是否应该主要使用 subprocess.Popen,以及用于 stdin、stdout 和 stderr 的 PIPE?

2) 当 Popen() 在其中一个线程中失败时,如何防止 stdin、stdout 和 stderr 的文件描述符被关闭?

3)我在这里做错了吗?

4

2 回答 2

7

I would like to answer your questions with:

  1. Yes.
  2. You shouldn't have to.
  3. No.

The error occurs indeed in Python 2.7.4 as well.

I think this is a bug in the library code. If you add a lock in your program and make sure that the two calls to subprocess.Popen are executed atomically, the error does not occur.

@@ -1,32 +1,40 @@
 import time
 import threading
 import subprocess

+lock = threading.Lock()
+
 def subprocesscall():
+    lock.acquire()
     p = subprocess.Popen(
         ['ls', '-l'],
         stdin=subprocess.PIPE,
         stdout=subprocess.PIPE,
         stderr=subprocess.PIPE,
         )
+    lock.release()
     time.sleep(2) # simulate the Popen call takes some time to complete.
     out, err = p.communicate()
     print 'succeeding command in thread:', threading.current_thread().ident

 def failingsubprocesscall():
     try:
+        lock.acquire()
         p = subprocess.Popen(
             ['thiscommandsurelydoesnotexist'],
             stdin=subprocess.PIPE,
             stdout=subprocess.PIPE,
             stderr=subprocess.PIPE,
             )
     except Exception as e:
         print 'failing command:', e, 'in thread:', threading.current_thread().ident
+    finally:
+        lock.release()
+

 print 'main thread is:', threading.current_thread().ident

 subprocesscall_thread = threading.Thread(target=subprocesscall)
 subprocesscall_thread.start()
 failingsubprocesscall()
 subprocesscall_thread.join()

This means that it is most probably due to some data race in the implementation of Popen. I will risk a guess: the bug may be in the implementation of pipe_cloexec, called by _get_handles, which (in 2.7.4) is:

def pipe_cloexec(self):
    """Create a pipe with FDs set CLOEXEC."""
    # Pipes' FDs are set CLOEXEC by default because we don't want them
    # to be inherited by other subprocesses: the CLOEXEC flag is removed
    # from the child's FDs by _dup2(), between fork() and exec().
    # This is not atomic: we would need the pipe2() syscall for that.
    r, w = os.pipe()
    self._set_cloexec_flag(r)
    self._set_cloexec_flag(w)
    return r, w

and the comment warns explicitly about it not being atomic... This definitely causes a data race but, without experimentation, I don't know if it's what causes the problem.

于 2013-08-26T23:16:35.173 回答
0

其他解决方案,如果您不处理打开的文件(例如,在构建 API 时)。

我通过执行windll API调用找到了解决该问题的方法,将所有已打开的文件描述符标记为“不可继承”。这有点像黑客,问答可以在这里找到:

Howto:close_fds=True 的解决方法并在 Windows 上重定向 stdout/stderr

它将绕过 Python 2.7 错误。

其他解决方案是使用 Python 3.4+ :) 已修复

于 2018-02-07T18:58:03.320 回答