python - 如何在运行时检查 python 模块是否有效而不导入它？

Question

我有一个包含子包的包，其中只有一个我需要在运行时导入 - 但我需要测试它们是否有效。这是我的文件夹结构：

game/
 __init__.py
 game1/
   __init__.py
   constants.py
   ...
 game2/
   __init__.py
   constants.py
   ...

现在，在启动时运行的代码会：

import pkgutil
import game as _game
# Detect the known games
for importer,modname,ispkg in pkgutil.iter_modules(_game.__path__):
    if not ispkg: continue # game support modules are packages
    # Equivalent of "from game import <modname>"
    try:
        module = __import__('game',globals(),locals(),[modname],-1)
    except ImportError:
        deprint(u'Error in game support module:', modname, traceback=True)
        continue
    submod = getattr(module,modname)
    if not hasattr(submod,'fsName') or not hasattr(submod,'exe'): continue
    _allGames[submod.fsName.lower()] = submod

但这有一个缺点，即所有的子包都被导入，导入了子包中的其他模块（例如 constants.py 等），这相当于几兆字节的垃圾。所以我想用子模块有效的测试来替换这段代码（它们可以很好地导入）。我想我应该以某种方式使用 eval - 但是如何？或者我该怎么办？

编辑： tldr；

我正在寻找与上面循环的核心等效的东西：

    try:
        probaly_eval(game, modname) # fails iff `from game import modname` fails
        # but does _not_ import the module
    except: # I'd rather have a more specific error here but methinks not possible
        deprint(u'Error in game support module:', modname, traceback=True)
        continue

因此，如果存在与 vis 错误检查完全等效的 import 语句 -无需导入模块，我想要一个明确的答案。这是我的问题，很多回答者和评论者回答了不同的问题。

score 1 · Accepted Answer

如果你想编译文件而不导入它（在当前解释器中），你可以使用py_compile.compile：

>>> import py_compile

# valid python file
>>> py_compile.compile('/path/to/valid/python/file.py')

# invalid python file
>>> py_compile.compile('/path/to/in-valid/python/file.txt')
Sorry: TypeError: compile() expected string without null bytes

上面的代码将错误写入std.error. 如果您想引发异常，则必须设置doraise为True(default False)。因此，您的代码将是：

from py_compile import compile, PyCompileError

try:
    compile('/path/to/valid/python/file.py', doraise=True)
    valid_file = True
except PyCompileError:
    valid_file = False

根据py_compile.compile的文件：

将源文件编译为字节码并写出字节码缓存文件。源代码是从名为 file 的文件中加载的。字节码被写入cfile，默认为 file + 'c'（如果在当前解释器中启用了优化，则为'o'）。如果指定了 dfile，它将在错误消息中用作源文件的名称，而不是文件。如果doraise为 true，PyCompileError则在编译文件时遇到错误时会引发 a。如果doraise为 false（默认值），则会将错误字符串写入sys.stderr，但不会引发异常。

检查以确保未导入已编译的模块 （在当前解释器中）：

>>> import py_compile, sys
>>> py_compile.compile('/path/to/main.py')

>>> print [key for key in locals().keys() if isinstance(locals()[key], type(sys)) and not key.startswith('__')]
['py_compile', 'sys']  # main not present

score 1 · Accepted Answer

也许您正在寻找py_compileorcompileall模块。
这里的文档：
https ://docs.python.org/2/library/py_compile.html
https://docs.python.org/2/library/compileall.html#module-compileall

您可以将所需的模块加载为模块并从程序中调用它。
例如：

import py_compile

try:
    py_compile.compile(your_py_file, doraise=True)
    module_ok = True
except py_compile.PyCompileError:
    module_ok = False

score 0 · Accepted Answer

0

于 2017-01-27T16:23:04.190 回答

score 0 · Accepted Answer

I believe imp.find_module satisfies at least some of your requirements: https://docs.python.org/2/library/imp.html#imp.find_module

A quick test shows that it does not trigger an import:

>>> import imp
>>> import sys
>>> len(sys.modules)
47
>>> imp.find_module('email')
(None, 'C:\\Python27\\lib\\email', ('', '', 5))
>>> len(sys.modules)
47
>>> import email
>>> len(sys.modules)
70

Here's an example usage in some of my code (which attempts to classify modules): https://github.com/asottile/aspy.refactor_imports/blob/2b9bf8bd2cf22ef114bcc2eb3e157b99825204e0/aspy/refactor_imports/classify.py#L38-L44

score 0 · Accepted Answer

We already had a custom importer (disclaimer: I did not write that code I 'm just the current maintainer) whose load_module:

def load_module(self,fullname):
    if fullname in sys.modules:
        return sys.modules[fullname]
    else: # set to avoid reimporting recursively
        sys.modules[fullname] = imp.new_module(fullname)
    if isinstance(fullname,unicode):
        filename = fullname.replace(u'.',u'\\')
        ext = u'.py'
        initfile = u'__init__'
    else:
        filename = fullname.replace('.','\\')
        ext = '.py'
        initfile = '__init__'
    try:
        if os.path.exists(filename+ext):
            with open(filename+ext,'U') as fp:
                mod = imp.load_source(fullname,filename+ext,fp)
                sys.modules[fullname] = mod
                mod.__loader__ = self
        else:
            mod = sys.modules[fullname]
            mod.__loader__ = self
            mod.__file__ = os.path.join(os.getcwd(),filename)
            mod.__path__ = [filename]
            #init file
            initfile = os.path.join(filename,initfile+ext)
            if os.path.exists(initfile):
                with open(initfile,'U') as fp:
                    code = fp.read()
                exec compile(code, initfile, 'exec') in mod.__dict__
        return mod
    except Exception as e: # wrap in ImportError a la python2 - will keep
        # the original traceback even if import errors nest
        print 'fail', filename+ext
        raise ImportError, u'caused by ' + repr(e), sys.exc_info()[2]

So I thought I could replace the parts that access the sys.modules cache with overriddable methods that would in my override leave that cache alone:

So:

@@ -48,2 +55,2 @@ class UnicodeImporter(object):
-        if fullname in sys.modules:
-            return sys.modules[fullname]
+        if self._check_imported(fullname):
+            return self._get_imported(fullname)
@@ -51 +58 @@ class UnicodeImporter(object):
-            sys.modules[fullname] = imp.new_module(fullname)
+            self._add_to_imported(fullname, imp.new_module(fullname))
@@ -64 +71 @@ class UnicodeImporter(object):
-                    sys.modules[fullname] = mod
+                    self._add_to_imported(fullname, mod)
@@ -67 +74 @@ class UnicodeImporter(object):
-                mod = sys.modules[fullname]
+                mod = self._get_imported(fullname)

and define:

class FakeUnicodeImporter(UnicodeImporter):

    _modules_to_discard = {}

    def _check_imported(self, fullname):
        return fullname in sys.modules or fullname in self._modules_to_discard

    def _get_imported(self, fullname):
        try:
            return sys.modules[fullname]
        except KeyError:
            return self._modules_to_discard[fullname]

    def _add_to_imported(self, fullname, mod):
        self._modules_to_discard[fullname] = mod

    @classmethod
    def cleanup(cls):
        cls._modules_to_discard.clear()

Then I added the importer in the sys.meta_path and was good to go:

importer = sys.meta_path[0]
try:
    if not hasattr(sys,'frozen'):
        sys.meta_path = [fake_importer()]
    perform_the_imports() # see question
finally:
    fake_importer.cleanup()
    sys.meta_path = [importer]

Right ? Wrong!

Traceback (most recent call last):
  File "bash\bush.py", line 74, in __supportedGames
    module = __import__('game',globals(),locals(),[modname],-1)
  File "Wrye Bash Launcher.pyw", line 83, in load_module
    exec compile(code, initfile, 'exec') in mod.__dict__
  File "bash\game\game1\__init__.py", line 29, in <module>
    from .constants import *
ImportError: caused by SystemError("Parent module 'bash.game.game1' not loaded, cannot perform relative import",)

Huh ? I am currently importing that very same module. Well the answer is probably in import's docs

If the module is not found in the cache, then sys.meta_path is searched (the specification for sys.meta_path can be found in PEP 302).

That's not completely to the point but what I guess is that the statement from .constants import * looks up the sys.modules to check if the parent module is there, and I see no way of bypassing that (note that our custom loader is using the builtin import mechanism for modules, mod.__loader__ = self is set after the fact).

So I updated my FakeImporter to use the sys.modules cache and then clean that up.

class FakeUnicodeImporter(UnicodeImporter):

    _modules_to_discard = set()

    def _check_imported(self, fullname):
        return fullname in sys.modules or fullname in self._modules_to_discard

    def _add_to_imported(self, fullname, mod):
        super(FakeUnicodeImporter, self)._add_to_imported(fullname, mod)
        self._modules_to_discard.add(fullname)

    @classmethod
    def cleanup(cls):
        for m in cls._modules_to_discard: del sys.modules[m]

This however blew in a new way - or rather two ways:

a reference to the game/ package was held in bash top package instance in sys.modules:
```
bash\
  __init__.py
  the_code_in_question_is_here.py
  game\
    ...
```
because game is imported as bash.game. That reference held references to all game1, game2,..., subpackages so those were never garbage collected
a reference to another module (brec) was held as bash.brec by the same bash module instance. This reference was imported as from .. import brec in game\game1 without triggering an import, to update SomeClass. However, in yet another module, an import of the form from ...brec import SomeClass did trigger an import and another instance of the brec module ended up in the sys.modules. That instance had a non updated SomeClass and blew with an AttributeError.

Both were fixed by manually deleting those references - so gc collected all modules (for 5 mbytes of ram out of 75) and the from .. import brec did trigger an import (this from ... import foo vs from ...foo import bar warrants a question).

The moral of the story is that it is possible but:

the package and subpackages should only reference each other
all references to external modules/packages should be deleted from top level package attributes
the package reference itself should be deleted from top level package attribute

If this sounds complicated and error prone it is - at least now I have a much cleaner view of interdependencies and their perils - time to address that.

This post was sponsored by Pydev's debugger - I found the gc module very useful in grokking what was going on - tips from here. Of course there were a lot of variables that were the debugger's and that complicated stuff

python - 如何在运行时检查 python 模块是否有效而不导入它？

5 回答 5

Related

Reference