38

以下不起作用

一个.py

import shared
shared.value = 'Hello'
raw_input('A cheap way to keep process alive..')

二.py

import shared
print shared.value

在两个命令行上运行:

>>python one.py
>>python two.py

(第二个得到一个属性错误,这是正确的)。

有没有办法做到这一点,即在两个脚本之间共享一个变量?

4

13 回答 13

57

Hope it's OK to jot down my notes about this issue here.

First of all, I appreciate the example in the OP a lot, because that is where I started as well - although it made me think shared is some built-in Python module, until I found a complete example at [Tutor] Global Variables between Modules ??.

However, when I looked for "sharing variables between scripts" (or processes) - besides the case when a Python script needs to use variables defined in other Python source files (but not necessarily running processes) - I mostly stumbled upon two other use cases:

  • A script forks itself into multiple child processes, which then run in parallel (possibly on multiple processors) on the same PC
  • A script spawns multiple other child processes, which then run in parallel (possibly on multiple processors) on the same PC

As such, most hits regarding "shared variables" and "interprocess communication" (IPC) discuss cases like these two; however, in both of these cases one can observe a "parent", to which the "children" usually have a reference.

What I am interested in, however, is running multiple invocations of the same script, ran independently, and sharing data between those (as in Python: how to share an object instance across multiple invocations of a script), in a singleton/single instance mode. That kind of problem is not really addressed by the above two cases - instead, it essentially reduces to the example in OP (sharing variables across two scripts).

Now, when dealing with this problem in Perl, there is IPC::Shareable; which "allows you to tie a variable to shared memory", using "an integer number or 4 character string[1] that serves as a common identifier for data across process space". Thus, there are no temporary files, nor networking setups - which I find great for my use case; so I was looking for the same in Python.

However, as accepted answer by @Drewfer notes: "You're not going to be able to do what you want without storing the information somewhere external to the two instances of the interpreter"; or in other words: either you have to use a networking/socket setup - or you have to use temporary files (ergo, no shared RAM for "totally separate python sessions").

Now, even with these considerations, it is kinda difficult to find working examples (except for pickle) - also in the docs for mmap and multiprocessing. I have managed to find some other examples - which also describe some pitfalls that the docs do not mention:

Thanks to these examples, I came up with an example, which essentially does the same as the mmap example, with approaches from the "synchronize a python dict" example - using BaseManager (via manager.start() through file path address) with shared list; both server and client read and write (pasted below). Note that:

  • multiprocessing managers can be started either via manager.start() or server.serve_forever()
    • serve_forever() locks - start() doesn't
    • There is auto-logging facility in multiprocessing: it seems to work fine with start()ed processes - but seems to ignore the ones that serve_forever()
  • The address specification in multiprocessing can be IP (socket) or temporary file (possibly a pipe?) path; in multiprocessing docs:
    • Most examples use multiprocessing.Manager() - this is just a function (not class instantiation) which returns a SyncManager, which is a special subclass of BaseManager; and uses start() - but not for IPC between independently ran scripts; here a file path is used
    • Few other examples serve_forever() approach for IPC between independently ran scripts; here IP/socket address is used
    • If an address is not specified, then an temp file path is used automatically (see 16.6.2.12. Logging for an example of how to see this)

In addition to all the pitfalls in the "synchronize a python dict" post, there are additional ones in case of a list. That post notes:

All manipulations of the dict must be done with methods and not dict assignments (syncdict["blast"] = 2 will fail miserably because of the way multiprocessing shares custom objects)

The workaround to dict['key'] getting and setting, is the use of the dict public methods get and update. The problem is that there are no such public methods as alternative for list[index]; thus, for a shared list, in addition we have to register __getitem__ and __setitem__ methods (which are private for list) as exposed, which means we also have to re-register all the public methods for list as well :/

Well, I think those were the most critical things; these are the two scripts - they can just be ran in separate terminals (server first); note developed on Linux with Python 2.7:

a.py (server):

import multiprocessing
import multiprocessing.managers

import logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)


class MyListManager(multiprocessing.managers.BaseManager):
    pass


syncarr = []
def get_arr():
    return syncarr

def main():

    # print dir([]) # cannot do `exposed = dir([])`!! manually:
    MyListManager.register("syncarr", get_arr, exposed=['__getitem__', '__setitem__', '__str__', 'append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'])

    manager = MyListManager(address=('/tmp/mypipe'), authkey='')
    manager.start()

    # we don't use the same name as `syncarr` here (although we could);
    # just to see that `syncarr_tmp` is actually <AutoProxy[syncarr] object>
    # so we also have to expose `__str__` method in order to print its list values!
    syncarr_tmp = manager.syncarr()
    print("syncarr (master):", syncarr, "syncarr_tmp:", syncarr_tmp)
    print("syncarr initial:", syncarr_tmp.__str__())

    syncarr_tmp.append(140)
    syncarr_tmp.append("hello")

    print("syncarr set:", str(syncarr_tmp))

    raw_input('Now run b.py and press ENTER')

    print
    print 'Changing [0]'
    syncarr_tmp.__setitem__(0, 250)

    print 'Changing [1]'
    syncarr_tmp.__setitem__(1, "foo")

    new_i = raw_input('Enter a new int value for [0]: ')
    syncarr_tmp.__setitem__(0, int(new_i))

    raw_input("Press any key (NOT Ctrl-C!) to kill server (but kill client first)".center(50, "-"))
    manager.shutdown()

if __name__ == '__main__':
  main()

b.py (client)

import time

import multiprocessing
import multiprocessing.managers

import logging
logger = multiprocessing.log_to_stderr()
logger.setLevel(logging.INFO)


class MyListManager(multiprocessing.managers.BaseManager):
    pass

MyListManager.register("syncarr")

def main():
  manager = MyListManager(address=('/tmp/mypipe'), authkey='')
  manager.connect()
  syncarr = manager.syncarr()

  print "arr = %s" % (dir(syncarr))

  # note here we need not bother with __str__ 
  # syncarr can be printed as a list without a problem:
  print "List at start:", syncarr
  print "Changing from client"
  syncarr.append(30)
  print "List now:", syncarr

  o0 = None
  o1 = None

  while 1:
    new_0 = syncarr.__getitem__(0) # syncarr[0]
    new_1 = syncarr.__getitem__(1) # syncarr[1]

    if o0 != new_0 or o1 != new_1:
      print 'o0: %s => %s' % (str(o0), str(new_0))
      print 'o1: %s => %s' % (str(o1), str(new_1))
      print "List is:", syncarr

      print 'Press Ctrl-C to exit'
      o0 = new_0
      o1 = new_1

    time.sleep(1)


if __name__ == '__main__':
    main()

As a final remark, on Linux /tmp/mypipe is created - but is 0 bytes, and has attributes srwxr-xr-x (for a socket); I guess this makes me happy, as I neither have to worry about network ports, nor about temporary files as such :)

Other related questions:

于 2013-02-05T04:45:07.653 回答
25

如果不将信息存储在解释器的两个实例外部的某个位置,您将无法做您想做的事情。
如果它只是您想要的简单变量,您可以轻松地将 python dict 转储到脚本一中带有 pickle 模块的文件中,然后在脚本二中重新加载它。例子:

一个.py

import pickle

shared = {"Foo":"Bar", "Parrot":"Dead"}
fp = open("shared.pkl","w")
pickle.dump(shared, fp)

二.py

import pickle

fp = open("shared.pkl")
shared = pickle.load(fp)
print shared["Foo"]
于 2009-12-01T22:05:37.730 回答
17
sudo apt-get install memcached python-memcache

一个.py

import memcache
shared = memcache.Client(['127.0.0.1:11211'], debug=0)
shared.set('Value', 'Hello')

二.py

import memcache
shared = memcache.Client(['127.0.0.1:11211'], debug=0)    
print shared.get('Value')
于 2010-10-09T22:54:13.790 回答
6

您在这里尝试执行的操作(通过单独的 Python 解释器将共享状态存储在 Python 模块中)将不起作用。

模块中的值可以由一个模块更新,然后由另一个模块读取,但这必须在同一个 Python 解释器中。您似乎在这里所做的实际上是一种进程间通信;这可以通过两个进程之间的套接字通信来完成,但它比您期望的工作要简单得多。

于 2009-12-01T21:48:48.880 回答
6

您可以使用相对简单的 mmap 文件。您可以使用 shared.py 来存储公共常量。以下代码将适用于不同的 python 解释器\脚本\进程

共享的.py:

MMAP_SIZE = 16*1024 
MMAP_NAME = 'Global\\SHARED_MMAP_NAME'

* “全局”是全局名称的 windows 语法

一个.py:

from shared import MMAP_SIZE,MMAP_NAME                                                        
def write_to_mmap():                                                                          
    map_file = mmap.mmap(-1,MMAP_SIZE,tagname=MMAP_NAME,access=mmap.ACCESS_WRITE)             
    map_file.seek(0)                                                                          
    map_file.write('hello\n')                                                                 
    ret = map_file.flush() != 0                                                               
    if sys.platform.startswith('win'):                                                        
        assert(ret != 0)                                                                      
    else:                                                                                     
        assert(ret == 0)                                                                      

二.py:

from shared import MMAP_SIZE,MMAP_NAME                                          
def read_from_mmap():                                                           
    map_file = mmap.mmap(-1,MMAP_SIZE,tagname=MMAP_NAME,access=mmap.ACCESS_READ)
    map_file.seek(0)                                                            
    data = map_file.readline().rstrip('\n')                                     
    map_file.close()                                                            
    print data                                                                  

*此代码是为windows编写的,linux可能需要稍作调整

更多信息 - https://docs.python.org/2/library/mmap.html

于 2015-01-27T10:08:00.007 回答
5

通过 共享动态变量Redis

script_one.py

from redis import Redis
from time import sleep

cli = Redis('localhost')
shared_var = 1

while True:
   cli.set('share_place', shared_var)
   shared_var += 1
   sleep(1)

在终端(一个进程)中运行script_one :

$ python script_one.py

script_two.py

from redis import Redis
from time import sleep

cli = Redis('localhost')

while True:
    print(int(cli.get('share_place')))
    sleep(1)

在另一个终端(另一个进程)中运行script_two :

$ python script_two.py

出去:

1
2
3
4
5
...

依赖项:

$ pip install redis
$ apt-get install redis-server
于 2019-04-10T12:48:44.247 回答
4

我建议您使用多处理模块。您不能从命令行运行两个脚本,但您可以让两个单独的进程轻松地相互通信。

从文档的示例中:

from multiprocessing import Process, Queue

def f(q):
    q.put([42, None, 'hello'])

if __name__ == '__main__':
    q = Queue()
    p = Process(target=f, args=(q,))
    p.start()
    print q.get()    # prints "[42, None, 'hello']"
    p.join()
于 2009-12-02T04:01:06.383 回答
3

您需要将变量存储在某种持久文件中。有几个模块可以做到这一点,具体取决于您的确切需要。

pickle 和 cPickle 模块可以将大多数 python 对象保存和加载到文件中。

shelve 模块可以将 python 对象存储在类似字典的结构中(在幕后使用 pickle)。

dbm/bsddb/dbhash/gdm 模块可以将字符串变量存储在类似字典的结构中。

sqlite3 模块可以将数据存储在轻量级的 SQL 数据库中。

其中大多数的最大问题是它们没有在不同的进程之间同步——如果一个进程读取一个值而另一个进程正在写入数据存储,那么您可能会得到不正确的数据或数据损坏。为了解决这个问题,您需要编写自己的文件锁定机制或使用成熟的数据库。

于 2009-12-01T21:50:11.840 回答
2

如果您想读取和修改单独运行的 2 个脚本之间的共享数据,一个好的解决方案是利用 python 多处理模块并使用Pipe() 或 Queue()(请参阅此处的差异)。这样您就可以同步脚本并避免有关并发和全局变量的问题(例如,如果两个脚本都想同时修改一个变量会发生什么)。

使用管道/队列最好的部分是您可以通过它们传递 python 对象。

如果尚未传递数据,还有一些方法可以避免等待数据(queue.empty()pipeConn.poll())。

请参阅下面使用 Queue() 的示例:

    # main.py
    from multiprocessing import Process, Queue
    from stage1 import Stage1
    from stage2 import Stage2


    s1= Stage1()
    s2= Stage2()

    # S1 to S2 communication
    queueS1 = Queue()  # s1.stage1() writes to queueS1

    # S2 to S1 communication
    queueS2 = Queue()  # s2.stage2() writes to queueS2

    # start s2 as another process
    s2 = Process(target=s2.stage2, args=(queueS1, queueS2))
    s2.daemon = True
    s2.start()     # Launch the stage2 process

    s1.stage1(queueS1, queueS2) # start sending stuff from s1 to s2 
    s2.join() # wait till s2 daemon finishes
    # stage1.py
    import time
    import random

    class Stage1:

      def stage1(self, queueS1, queueS2):
        print("stage1")
        lala = []
        lis = [1, 2, 3, 4, 5]
        for i in range(len(lis)):
          # to avoid unnecessary waiting
          if not queueS2.empty():
            msg = queueS2.get()    # get msg from s2
            print("! ! ! stage1 RECEIVED from s2:", msg)
            lala = [6, 7, 8] # now that a msg was received, further msgs will be different
          time.sleep(1) # work
          random.shuffle(lis)
          queueS1.put(lis + lala)             
        queueS1.put('s1 is DONE')
    # stage2.py
    import time

    class Stage2:

      def stage2(self, queueS1, queueS2):
        print("stage2")
        while True:
            msg = queueS1.get()    # wait till there is a msg from s1
            print("- - - stage2 RECEIVED from s1:", msg)
            if msg == 's1 is DONE ':
                break # ends loop
            time.sleep(1) # work
            queueS2.put("update lists")             

编辑:刚刚发现您可以使用queue.get(False)来避免接收数据时的阻塞。这样就不需要先检查队列是否为空。如果您使用管道,这是不可能的。

于 2019-09-19T16:37:36.770 回答
1

使用文本文件或环境变量。由于这两者是分开运行的,所以你不能真正做你想做的事情。

于 2009-12-01T21:51:02.683 回答
1

在您的示例中,第一个脚本运行完成,然后第二个脚本运行。这意味着您需要某种持久状态。其他答案建议使用文本文件或 Python 的pickle模块。就我个人而言,我很懒惰,当我可以使用时,我不会使用文本文件pickle;为什么要编写解析器来解析我自己的文本文件格式?

pickle也可以使用该json模块将其存储为 JSON。如果您想将数据共享给非 Python 程序,这可能更可取,因为 JSON 是一种简单且通用的标准。如果您的 Python 没有json,请获取simplejson

如果您的需求超出了pickle或者json——假设你实际上想要同时执行两个 Python 程序并实时更新持久状态变量——我建议你使用SQLite数据库。使用 ORM 将数据库抽象出来,非常简单。对于 SQLite 和 Python,我推荐Autumn ORM

于 2009-12-01T23:59:10.913 回答
-1

您可以使用 python 中的基本fromimport函数将变量导入two.py. 例如:

from filename import variable

那应该从文件中导入变量。(当然你应该替换filenameone.py, 并替换variable为你想分享给的变量two.py。)

于 2021-11-02T18:09:13.310 回答
-4

您也可以通过将变量设为全局来解决此问题

蟒蛇第一.py

class Temp:
    def __init__(self):
        self.first = None

global var1
var1 = Temp()
var1.first = 1
print(var1.first)

蟒蛇第二个.py

import first as One
print(One.var1.first)
于 2020-03-03T09:40:20.073 回答