1

这是一个复杂的问题,所以我会尽力解释清楚,不要提供太多不必要的细节。

去年我为工作开发了一个 python 脚本。它获取基本系统数据并将其发送到 HTTP/S 服务器,如果用户选择,该服务器可以发回命令。这是去年的一次大实验,看看什么有效,什么无效。测试公司内部的不同需求等。但现在我对我们需要什么有了非常深刻的理解。所以我开始了我的版本 2 之旅。

这个新版本的目的是在减少系统/CPU 负载和带宽的同时保持功能。在开发出这个 Python 脚本之后,剩下的工作将在 HTTP/S 服务器上完成。我的问题专门针对客户端,即 Python 脚本。我使用的是 Python 2.7.x,最常见的是基于 Debian 的系统。

v1 脚本抓取系统数据,读取包含要发送数据的服务器的配置文件,使用线程发送到每个服务器。(仍然在那些线程中)每个服务器可以返回 1 个或多个命令,然后也通过它们自己的线程处理这些命令。该脚本通过 crontab 每分钟运行一次。您可以让 5 台或更多服务器分别发送 10 条命令,并且脚本仍然可以顺利、有效地执行所有操作,并且无需花费很长时间来完成服务器发出的命令。

在 v2 脚本中,我正在寻求进行以下必要的更改:

  • 将作为系统服务运行。因此,代码不是每分钟由 cron 运行,而是每隔几秒循环一次。

  • 循环需要每次通过循环收集一次数据,然后将其发送到每个 Web 服务器(在配置文件中定义)

  • 我想要持久的 HTTP/S 连接以优化性能和带宽。

  • 我不想每次都通过每个 HTTP/S 服务器的循环来收集数据。我只想通过驱动服务的主循环每次迭代收集一次数据,然后将该数据发送到管理已建立的 HTTP/S 持久连接的线程。

我的问题就在这里。如何在它们各自的线程中获得持久连接并在只收集一次数据的同时将数据获取到这些线程?

httplib 是否重用 TCP 连接?我看到可以以这种方式完成持久连接(谢谢Corey Goldberg):

con = httplib.HTTPConnection("myweb.com")
while True:
    con.request("GET", "/x.css", headers={"Connection":" keep-alive"})
    result = con.getresponse()
    result.read()
    print result.reason, result.getheaders()

数据收集需要在这个循环中进行。但是我需要在多个线程同时与不同的服务器通信时发生这种情况,并且不想浪费资源来多次获取数据。鉴于我对 Python 的了解相对有限,我只是不明白这怎么可能。

基本上,正如我现在所看到的,需要有一个循环来驱动其线程内的 HTTP/S。然后我需要某种循环来收集我的数据并准备好进入 HTTP/S 连接。但是如何以这种方式在第二个循环中获取第一个循环?这就像我需要数据收集循环内的 HTTP/S 持久连接循环,但我还需要 HTTP/S 循环内的数据收集循环。

我想探索任何可以完成的纯 2.7.x pythonic 方式。由于各种原因,取决于外部实用程序可能会出现问题。该脚本完成后,将部署到 150 多个 linux 系统,出错越少越好。

感谢您的帮助和考虑!

4

1 回答 1

1

我将把这个留给像我一样正在寻求扩展他们对 Python 理解的其他人。我花了一段时间才弄清楚如何解决这个问题,但在与了解此类问题的同事交谈后,解决方案变得清晰。

所以简而言之,对我有用的答案是使用 Python 2.7.x 的原生 Threading 和 Queue 模块。

我有这个我的主程序,它管理我设置的各种线程和队列。NetworkWorker 类扩展了线程模块,在初始化时它也会为每个实例旋转自己的新队列。队列引用/处理程序存储在全局列表变量中。我只是遍历队列列表并将数据发送到我的主线程(main.py)中的每个线程队列。然后每个线程获取它的数据并做它应该做的事情。并且从每个 HTTP 连接接收的数据被加载到另一个队列中,该队列由 main.py 中的单个命令执行线程处理。

以下代码已从其原始上下文中修改/提取。我已经对其进行了测试,只要您正确配置 self.conf DICT 中的服务器(位于 main.py > my_service > init中),并且服务器响应使用有效的 JSON,它就可以完美运行。老实说,它可以使用一些清理。为了确保代码保持公开和可访问,我添加了知识共享许可证。任何觉得此代码与他们自己的代码相似的人都可以联系我以获得正确的归属。

除 main.py 外,其他 2 个文件的名称很重要。shared_globals.py 和 workerThread.py 文件名区分大小写,并且必须与 main.py 位于同一文件夹中

主要可执行文件:main.py

#!/usr/bin/python
# encoding=utf8

from time import sleep, time
import subprocess, sys, os # used to get IP, system calls, etc
import json

# For web support
import httplib
import urllib
import zlib
import base64

# wokerThread Dependancy
import shared_globals
from workerThread import NetworkWorker

import Queue
import threading

'''
This work, Python NetworkWorker Queue / Threading, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Written by John Minton @ http://pythonjohn.com/
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
'''

class my_service:

    # * * * * 
    def __init__(self):

        # Manually list off the servers I want to talk to

        self.conf = {}
        self.conf['servers'] = {}

        self.conf['servers']['ServerName1'] = {}
        self.conf['servers']['ServerName1']['protocol'] = "http"
        self.conf['servers']['ServerName1']['url'] = "server.com"
        self.conf['servers']['ServerName1']['port'] = "80"
        self.conf['servers']['ServerName1']['path'] = "/somefile.php"
        self.conf['servers']['ServerName1']['timeout'] = "10" # Seconds. Make sure this is long enough for your largest OR mission critical HTTP/S transactions to finish + time it takes to wait for your data to come into your persistant HTTP/S thread. Data comes in every 2 seconds, so 5-10 seconds should be fine. Anything that takes too long will cause the queue to back up too much.

        self.conf['servers']['ServerName2'] = {}
        self.conf['servers']['ServerName2']['protocol'] = "http"
        self.conf['servers']['ServerName2']['url'] = "otherserver.net"
        self.conf['servers']['ServerName2']['port'] = "80"
        self.conf['servers']['ServerName2']['path'] = "/dataio.php"
        self.conf['servers']['ServerName2']['timeout'] = "5"

        # Start the Threading Manager, which will manage the various threads and their components
        # All cross thread communication needs to be managed with Queues
        self.threadManager()


    def threadManager(self):

        # A place to reference all threads
        self.threads = []

        print "Loading Shared Globals"
        # This is the 3rd file in this project. I would not need this if
        # the NetworkWorker Thread was inside of this same file. But since it
        # is in another file, we use this shared_globals file to make the Queue's 
        # list and other shared resources available between the main thread and the NetworkWorker Threads
        shared_globals.init()

        # Keep track of all the threads / classes we are initializing
        self.workers = {} # Keep track of all the worker threads

        print "Initalizing Network Worker Threads from Config"
        # For each server we want to talk to, we start a worker thread
        # Read servers from self.conf and init threads / workers
        for t in self.conf['servers']: # Loop through servers in config
            # T = server name
            #print "T: ", self.conf['servers'][t]
            self.workers[t] = NetworkWorker()      # Save worker handlers to workers dict

            # Set the server data for each NetworkWorker Thread
            self.workers[t].set_server(self.conf['servers'][t]['url'], self.conf['servers'][t]['port'], self.conf['servers'][t]['timeout'], self.conf['servers'][t]['path'])

        print "Initalizing Command Processing Queue"
        cmd_q = Queue.Queue()
        cmd_q.daemon = True
        shared_globals.cmd_active_queue = cmd_q

        print "Starting Command Processing thread"
        # Start the data gathering thread
        t_cmd = threading.Thread(target=self.command_que_thread_manager)
        t_cmd.daemon = True
        self.threads.append(t_cmd)
        t_cmd.start()

        print "Start Data Gathering thread"
        # Start the data gathering thread
        t = threading.Thread(target=self.data_collector_thread)
        t.daemon = True
        self.threads.append(t)
        t.start()

        print "Starting Worker threads"
        for w in self.workers:      # Loop through all worker handlers
            self.workers[w].start() # Start the jobs

        # We have our NetworkWorker Threads running, and they init their own queues which we 
        # send data to using the def below titled self.send_data_to_networkWorkers

        print "Service Started\n\n\n"

        # This keeps the main thread listening so you can perform actions like killing the application with CTRL+C
        while threading.active_count() > 0:
            try:
                sleep(0.1)
            except (KeyboardInterrupt, SystemExit): # Exits the main thread without complainnt!
                print "\n"
                os._exit(0)
        os._exit(0)

    def data_collector_thread(self):
        '''
        Gather all the data we want to send to each server
        Send data to the queues for each NetworkWorker thread we init'd above
        '''
        # Loop indefinately
        while True:

            # Gather your data and load into data Dict
            data = {"data":"values"}
            print "\n\nData to be sent to all NetworkWorker threads: ", data, "\n\n"

            # Prep the data for HTTP/S
            # If you need to do something else with the data besides sending it to the threads, do it here
            data = self.prep_data_for_HTTP(data) # Do any pre-HTTP/S processing here
            self.send_data_to_networkWorkers(data) # Send the data out to all the Threads Queue's
            sleep(2) # wait for a little bit and then iterate through the loop again. This is your main loop timer.

    def prep_data_for_HTTP(self, data):
        '''
        I am converting my data from a python dict to a JSON Starting
        I compress the JSON Starting
        I load the compressed string into another dict, as the HTTP/S object (in the NetworkWorker thread) expects a DICT
        URL encode the data for HTTP/S POST transit
        Return the manipulated data object, now ready for HTTP/S
        '''
        data = json.dumps(data, encoding='utf8') # Now continue preparing for HTTP/S
        data = zlib.compress(data, 8)
        # In PHP, get the data from the $_POST['data'] key
        data = {"data":data}
        data = urllib.urlencode(data)
        return data
    # END DEF

    def command_que_thread_manager(self):
        '''
        Run as a thread
        Send data to this thread via it's queue, init'd above in thread Manager
        Grabs data, and then does something to process it
        '''
        while True:
            data = shared_globals.cmd_active_queue.get()
            print "Processing Command: ", data
    # END DEF

    def send_data_to_networkWorkers(self,data):
        '''
        Send data to all the NetworkWorker threads
        '''
        for q in shared_globals.network_active_queues:
            q.put(data)

    def clean_exit(self):
        '''
        Run when exiting the program for a clean exit
        I don't think I actually call this in my example, 
        but upon main thread exit it would be a good idea to do so
        '''
        for w in self.workers:      # Loop through all worker handlers
            self.workers[w].stop()  # Stop the jobs

    # END DEF   

# END CLASS

if __name__ == "__main__":
    my_service = my_service()

共享全局文件:shared_globals.py

#!/usr/bin/python
# encoding=utf8

'''
This work, Python NetworkWorker Queue / Threading, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Written by John Minton @ http://pythonjohn.com/
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
'''

def init():

    global network_active_queues
    global cmd_active_queues
    global cmd_q

    # Keep track of the data going to the Network Worker Threads
    print "Initalizing Network Active Queues"
    network_active_queues = []

    # Keep track of the commands 
    print "Initalizing Command Active Queues"
    cmd_active_queue = ""

    # ?
    #cmd_q = []

NetworkWorker 类:workerThread.py

#!/usr/bin/python
# encoding=utf8
'''
This work, Python NetworkWorker Queue / Threading, is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Written by John Minton @ http://pythonjohn.com/
To view a copy of this license, visit http://creativecommons.org/licenses/by-nc-sa/4.0/.
'''
import Queue
import threading

import httplib
import urllib
import json 

# wokerThread Dependancy
# Add another queue list for HTTP/S Responses
import shared_globals

class NetworkWorker(threading.Thread):

    def __init__(self):
        '''
        Extend the Threading module
        Start a new Queue for this instance of this class
        Run the thread as a daemon
        shared_globals is an external file for my globals between main script and this class.
        Append this Queue to the list of Queue's in shared_globals.network_active_queues
        Loop through shared_globals.network_active_queues to send data to all Queues that were started with this class
        '''
        threading.Thread.__init__(self)
        self.q = Queue.Queue()
        self.q.daemon = True
        shared_globals.network_active_queues.append(self.q)
        # Init the queue for processing commands

    def run(self):
        '''
        Establish a persistant HTTP Connection
        Pull data from the Queue
        When data comes in, send it to the server
        I send the response from the HTTP server to another queue / thread
        You can do what you want to do with responses from the HTTP Server
        '''
        # Set your headers
        headers = {"Content-type": "application/x-www-form-urlencoded", "Accept": "text/plain", "Connection": "keep-alive"} # "Connection": "keep-alive" for persistance
        # Init the presistant HTTP connection
        http_request = httplib.HTTPConnection( self.url, int(self.port), timeout=int(self.timeout) )
        # Init response_data
        response_data = str()
        # Start the loop
        while True:
            # The code waits here for the queue to have data. If no data, it just sleeps until you send it data via it's Queue.
            data = self.q.get()
            # .... When it gets data, we proceed with the data variable.
            try:
                http_request.request( "POST", self.path, data, headers )
                response = http_request.getresponse()
                response_data = response.read()
                # This is the response from the HTTP/S Server
                print "Response: ", response_data
            except Exception, e:
                # In the event something goes wrong, we can simply try to reestablish the HTTP
                print e, "Re-establishing HTTP/S Connection"
                http_request = httplib.HTTPConnection( self.url, int(self.port), timeout=int(self.timeout) )

            # If the HTTP transaction was successful, we will have our HTTP response data in response_data variable
            if response_data:
                # Try Except will fail on bad JSON object           
                try:
                    # Validate JSON & Convert from JSON to native Python Dict   
                    json_data = json.loads(response_data)

                    # Send response from server to the command thread manager
                    shared_globals.cmd_active_queue.put(json_data)

                except ValueError, e:
                    print "Bad Server Response: Discarding Invalid JSON"
                    # Repackage the invalid JSON, or some identifier thereof, and send to command processing thread
                    # Load into THIS NetworkWorker's thread queue a new data object to tell the server that there was malformed JSON and to resend the data.
                    #http_request.request( "POST", self.path, data, headers )
                    #response = http_request.getresponse()
                    #response_data = response.read()


        # Place this here for good measure, if we ever exit the while loop we will close the HTTP/S connection
        http_request.close()

    # END DEF


    def set_server(self, url, port, timeout, path):
        '''
        Use this to set the server for this class / thread instance
        Variables that are passed in are translated to class instance variables (self)
        '''
        self.url = url
        self.port = port
        self.timeout = timeout
        self.path = path
    # END DEF


    def stop(self):
        '''
        Stop this queue
        Stop this thread
        Clean up anything else as needed - tell other threads / queues to shutdown
        '''
        shared_globals.network_active_queues.remove(self.q)
        #self.q.put("shutdown") # Do we need to tell the threads to shutdown? Perhaps if reloading the config
        self.join()

    # END DEF

# END CLASS
于 2016-06-13T21:00:31.470 回答