python - 如何/应该如何使用 Python/其他语言管理跨包模块中的全局数据？

Question

我正在尝试为可以编译和解释的编程语言 ( Heron ) 设计包和模块系统，从我所见，我真的很喜欢 Python 方法。Python 有丰富的模块选择，这似乎在很大程度上促成了它的成功。

我不知道的是，如果一个模块包含在两个不同的编译包中，Python 中会发生什么：数据是否存在单独的副本或是否共享？

与此相关的是一堆附带问题：

我是否正确地假设包可以在 Python 中编译？
这两种方法（复制或共享模块数据）有什么优缺点？
从 Python 社区的角度来看，Python 模块系统是否存在众所周知的问题？例如，是否正在考虑使用 PEP 来增强模块/包？
Python 模块/包系统的某些方面是否不适用于编译语言？

score 3 · Accepted Answer

模块是 Python 中唯一真正的全局对象，所有其他全局数据都基于模块系统（使用 sys.modules 作为注册表）。包只是具有特殊语义的模块，用于导入子模块。将 .py 文件“编译”成 .pyc 或 .pyo 并不是大多数语言所理解的编译：它只检查语法并创建一个代码对象，该对象在解释器中执行时会创建模块对象。

例子.py：

print "Creating %s module." % __name__

def show_def(f):
  print "Creating function %s.%s." % (__name__, f.__name__)
  return f

@show_def
def a():
  print "called: %s.a" % __name__

互动环节：

>>> import example
# first sys.modules['example'] is checked
# since it doesn't exist, example.py is found and "compiled" to example.pyc
# (since example.pyc doesn't exist, same would happen if it was outdated, etc.)
Creating example module. # module code is executed
Creating function example.a. # def statement executed
>>> example.a()
called: example.a
>>> import example
# sys.modules['example'] found, local variable example assigned to that object
# no 'Creating ..' output
>>> d = {"__name__": "fake"}
>>> exec open("example.py") in d
# the first import in this session is very similar to this
# in that it creates a module object (which has a __dict__), initializes a few
# variables in it (__builtins__, __name__, and others---packages' __init__
# modules have their own as well---look at some_module.__dict__.keys() or
# dir(some_module))
# and executes the code from example.py in this dict (or the code object stored
# in example.pyc, etc.)
Creating fake module. # module code is executed
Creating function fake.a. # def statement executed
>>> d.keys()
['__builtins__', '__name__', 'a', 'show_def']
>>> d['a']()
called: fake.a

你的问题：

从某种意义上说，它们是经过编译的，但如果您熟悉 C 编译器的工作方式，它们不会像您所期望的那样。
is如果数据是不可变的，则复制是可行的，并且除了对象标识（运算符和id()Python 中的）外，应该与共享没有区别。
导入可能会或可能不会执行代码（它们总是将局部变量分配给对象，但这不会造成问题）并且可能会或可能不会修改 sys.modules。您必须注意不要在线程中导入，通常最好在每个模块的顶部执行所有导入：这会导致级联图，因此所有导入都立即完成，然后 __main__ 继续执行 Real Work™ .
- 我不知道目前有任何 PEP，但也已经有很多复杂的机器到位了。例如，包可以具有__path__ 属性（实际上是路径列表），因此子模块不必位于同一目录中，甚至可以在运行时计算这些路径！（下面是 mungepath 包示例。）您可以拥有自己的导入钩子，在函数中使用 import 语句，直接调用 __import__，如果发现 2-3 种其他使用包和模块的独特方法，我不会感到惊讶。
导入系统的一个子集可以在传统编译的语言中工作，只要它类似于 C 的#include 之类的东西。您可以在编译器中运行“第一级”执行（创建模块对象），然后编译这些结果。然而，这有很大的缺点，并且相当于模块级代码和在运行时执行的函数的单独执行上下文（并且某些函数必须在这两个上下文中运行！）。（请记住，在 Python 中，每个语句都是在运行时执行的，甚至是 def 和 class 语句。）
- 我相信这是传统编译语言将“顶级”代码限制为类、函数和对象声明的主要原因，从而消除了第二个上下文。即使这样，除非仔细管理，否则 C/C++（和其他）中的全局对象也会出现初始化问题。

mungepath/__init__.py：

print __path__
__path__.append(".") # CWD, would be different in non-example code
print __path__
from . import example # this is example.py from above, and is NOT in mungepath/
# note that this is a degenerate case, in that we now have two names for the
# 'same' module: example and mungepath.example, but they're really different
# modules with different functions (use 'is' or 'id()' to verify)

互动环节：

>>> import example
Creating example module.
Creating function example.a.
>>> example.__dict__.keys()
['a', '__builtins__', '__file__', 'show_def', '__package__',
 '__name__', '__doc__']
>>> import mungepath
['mungepath']
['mungepath', '.']
Creating mungepath.example module.
Creating function mungepath.example.a.
>>> mungepath.example.a()
called: mungepath.example.a
>>> example is mungepath.example
False
>>> example.a is mungepath.example.a
False

score 3 · Accepted Answer

嗯，你问了很多问题。以下是一些进一步的提示：

一个。Python 代码经过词法分析并编译为 Python 特定指令，但未编译为机器可执行代码。每当您运行与现有 .pyc 时间戳不匹配的 python 代码时，都会自动创建“.pyc”文件。可以关闭此功能。您可以使用 dis 模块来查看这些说明。湾。当一个模块被导入时，它会在它自己的命名空间中执行（从上到下），并且这个命名空间被全局缓存。当您从另一个模块导入时，该模块不会再次执行。请记住，def 只是一个声明。您可能希望在代码中放置一个 print('compiling this module') 语句来跟踪它。
这取决于。
最近有一些改进，主要是围绕指定需要加载的模块。模块可以有相对路径，因此一个巨大的项目可能有多个具有相同名称的模块。
Python 本身不适用于编译语言。谷歌的“unladen燕子博客”可以看到试图加速一种语言的磨难，其中“a = sum(b)”可以在执行之间改变含义。在极端情况之外，模块系统在源代码和编译的库系统之间形成了一座很好的桥梁。该方法效果很好，Python 对 C 代码（swig 等）的简单包装也有帮助。

score 1 · Accepted Answer

全局数据的范围在解释器级别。

“包”可以编译为一个包只是一个模块的集合，它们本身可以被编译。
鉴于既定的数据范围，我不确定我是否理解。

python - 如何/应该如何使用 Python/其他语言管理跨包模块中的全局数据？

3 回答 3

Related

Reference