python - 获取 URL 时的多处理 python 3.2

Question

我编写了一个脚本来从 Steam API 获取库存数据，但我对速度有点不满意。所以我读了一些关于 python 中的多处理的内容，但我根本无法理解它。该程序是这样工作的：它从列表中获取 SteamID，获取库存，然后将 SteamID 和库存附加到字典中，其中 ID 作为键，库存内容作为值。

我还了解到，在多处理时使用计数器存在一些问题，这是一个小问题，因为我希望能够从上次获取的库存而不是从头开始恢复程序。

无论如何，我所要求的实际上是一个具体示例，说明如何在打开包含库存数据的 URL 时进行多处理，以便程序一次可以获取多个库存，而不仅仅是一个。

到代码上：

with open("index_to_name.json", "r", encoding=("utf-8")) as fp:
    index_to_name=json.load(fp)

with open("index_to_quality.json", "r", encoding=("utf-8")) as fp:
    index_to_quality=json.load(fp)

with open("index_to_name_no_the.json", "r", encoding=("utf-8")) as fp:
    index_to_name_no_the=json.load(fp)

with open("steamprofiler.json", "r", encoding=("utf-8")) as fp:
    steamprofiler=json.load(fp)

with open("itemdb.json", "r", encoding=("utf-8")) as fp:
    players=json.load(fp)

error=list()
playerinventories=dict()
c=127480

while c<len(steamprofiler):
    inventory=dict()
    items=list()
    try:
        url=urllib.request.urlopen("http://api.steampowered.com/IEconItems_440/GetPlayerItems/v0001/?key=DD5180808208B830FCA60D0BDFD27E27&steamid="+steamprofiler[c]+"&format=json")
        inv=json.loads(url.read().decode("utf-8"))
        url.close()
    except (urllib.error.HTTPError, urllib.error.URLError, socket.error, UnicodeDecodeError) as e:
        c+=1
        print("HTTP-error, continuing")
        error.append(c)
        continue
    try:
        for r in inv["result"]["items"]:
            inventory[r["id"]]=r["quality"], r["defindex"]
    except KeyError:
        c+=1
        error.append(c)
        continue
    for key in inventory:
        try:
            if index_to_quality[str(inventory[key][0])]=="":
                items.append(
                    index_to_quality[str(inventory[key][0])]
                    +""+
                    index_to_name[str(inventory[key][1])]
                    )
            else:
                items.append(
                    index_to_quality[str(inventory[key][0])]
                    +" "+
                    index_to_name_no_the[str(inventory[key][1])]
                    )
        except KeyError:
            print("keyerror, uppdate def_to_index")
            c+=1
            error.append(c)
            continue
    playerinventories[int(steamprofiler[c])]=items
    c+=1
    if c % 10==0:
        print(c, "inventories downloaded")

我希望我的问题很清楚，否则就这么说清楚。我最好避免使用 3rd 方库，但如果不可能，那就不可能了。提前致谢

score 1 · Accepted Answer

因此，您假设获取 URL 可能会减慢您的程序速度？您最好先检查该假设，但如果确实如此，则使用该multiprocessing模块是一个巨大的矫枉过正：对于 I/O 绑定的瓶颈，线程相当简单，甚至可能更快一些（它需要更多是时候产生另一个 python 解释器而不是产生一个线程了）。

查看您的代码，您可能会逃避将 while 循环的大部分内容粘贴在一个函数中c作为参数，并使用另一个函数从那里启动一个线程，例如：

def process_item(c):
    # The work goes here
    # Replace al those 'continue' statements with 'return'

for c in range(127480, len(steamprofiler)):
    thread = threading.Thread(name="inventory {0}".format(c), target=process_item, args=[c])
    thread.start()

一个真正的问题可能是产生的线程数量没有限制，这可能会破坏程序。此外，Steam 的人可能不会因为被你的脚本重击而感到好笑，他们可能会决定与你解除好友关系。

更好的方法是collections.deque用你的列表填充一个对象，c然后启动一组有限的线程来完成工作：

def process_item(c):
    # The work goes here
    # Replace al those 'continue' statements with 'return'

def process():
    while True:
       process_item(work.popleft())

work = collections.deque(range(127480, len(steamprofiler)))

threads = [threading.Thread(name="worker {0}".format(n), target=process)
                   for n in range(6)]
for worker in threads:
    worker.start()

请注意，我指望在我们失业时work.popleft()抛出一个，这将杀死线程。IndexError这有点偷偷摸摸，所以考虑使用 atry...except代替。

还有两件事：

考虑使用出色的Requests库，而不是urllib（从 API 角度来看，它是迄今为止我使用过的整个 Python 标准库中最差的模块）。
对于请求，有一个名为grequests的附加组件，它允许您执行完全异步的 HTTP 请求。这将使代码更简单。

我希望这会有所帮助，但请记住，这都是未经测试的代码。

score 0 · Accepted Answer

最外层的 while 循环似乎分布在几个进程（或任务）上。

当您将循环分解为任务时，请注意您在进程之间共享playerinventories和对象。error您将需要multiprocessing.Manager用于共享问题。

我建议您从这个代码片段开始修改您的代码。

python - 获取 URL 时的多处理 python 3.2

2 回答 2

Related

Reference