python - python: urlopen&threading 不必要的慢？有更快的方法吗？

Question

我正在编写一个客户端，它一次加载和解析许多页面并将数据从它们发送到服务器。如果我一次只运行一个页面处理器，事情就会顺利进行：

********** Round-trip (with 0 sends/0 loads) for (+0/.0/-0) was total 1.98s (1.60s load html, 0.24s parse, 0.00s on queue, 0.14s to process) **********
********** Round-trip (with 0 sends/0 loads) for (+0/.0/-0) was total 1.87s (1.59s load html, 0.25s parse, 0.00s on queue, 0.03s to process) **********
********** Round-trip (with 0 sends/0 loads) for (+0/.0/-0) was total 2.79s (1.78s load html, 0.28s parse, 0.00s on queue, 0.72s to process) **********
********** Round-trip (with 0 sends/1 loads) for (+0/.0/-0) was total 2.18s (1.70s load html, 0.34s parse, 0.00s on queue, 0.15s to process) **********
********** Round-trip (with 0 sends/1 loads) for (+0/.0/-0) was total 1.91s (1.47s load html, 0.21s parse, 0.00s on queue, 0.23s to process) **********
********** Round-trip (with 0 sends/1 loads) for (+0/.0/-0) was total 1.84s (1.59s load html, 0.22s parse, 0.00s on queue, 0.03s to process) **********
********** Round-trip (with 0 sends/0 loads) for (+0/.0/-0) was total 1.90s (1.67s load html, 0.21s parse, 0.00s on queue, 0.02s to process) **********

但是，由于同时运行约 20 个（每个都在自己的线程中），HTTP 流量变得非常慢：

********** Round-trip (with 2 sends/7 loads) for (+0/.0/-0) was total 23.37s (16.39s load html, 0.30s parse, 0.00s on queue, 6.67s to process) **********
********** Round-trip (with 2 sends/5 loads) for (+0/.0/-0) was total 20.99s (14.00s load html, 1.99s parse, 0.00s on queue, 5.00s to process) **********
********** Round-trip (with 4 sends/4 loads) for (+0/.0/-0) was total 17.89s (9.17s load html, 0.30s parse, 0.12s on queue, 8.31s to process) **********
********** Round-trip (with 3 sends/5 loads) for (+0/.0/-0) was total 26.22s (15.34s load html, 1.63s parse, 0.01s on queue, 9.24s to process) **********

该load html位是读取我正在处理的网页的 HTML 所需的时间（resp = self.mech.open(url)到resp.read(); resp.close()）。该to process位是从该客户端到处理它的服务器进行往返所需的时间（fp = urllib2.urlopen(...); fp.read(); fp.close()）。该X sends/Y loads位是同时发送到服务器并从我正在处理的网页加载的数量，这些网页在向服务器发出请求时正在运行。

我最关心的是那个to process位。服务器上的实际处理只需要0.2s左右。只发送了400 个字节，所以这不是占用太多带宽的问题。有趣的是，如果我运行一个打开 5 个线程并重复执行此操作的程序（同时进行所有这些同时发送/加载的解析）to process，它的运行速度非常快：

1 took 0.04s
1 took 1.41s in total
0 took 0.03s
0 took 1.43s in total
4 took 0.33s
2 took 0.49s
2 took 0.08s
2 took 0.01s
2 took 1.74s in total
3 took 0.62s
4 took 0.40s
3 took 0.31s
4 took 0.33s
3 took 0.05s
3 took 2.18s in total
4 took 0.07s
4 took 2.22s in total

to process这个独立程序中的每个只需要0.01sto 0.50s，远远少于成熟版本中的 6-10 秒，并且它使用的发送线程并没有减少（它使用 5 个，并且成熟版本的上限为5）。

也就是说，当完整版本正在运行时，运行一个单独的版本发送这些相同(+0/.0/-0)的请求，每个请求 400 字节，0.31每个请求只需要 s。所以，它不像我正在运行的机器被窃听......似乎其他线程中的多个同时加载正在减慢应该快速的速度（实际上是快速的，在另一个运行的程序中同一台机器）在其他线程中发送。

发送是用完成的urllib2.urlopen，而读取是用 mechanize 完成的（最终使用 fork urllib2.urlopen）。

有没有办法让完整的程序像这个迷你独立版本一样快速运行，至少在他们发送相同的东西时？我正在考虑编写另一个程序，它只接收通过命名管道或其他东西发送的内容，以便发送在另一个进程中完成，但这似乎很愚蠢，不知何故。欢迎大家提出意见。

任何有关如何更快地同时加载多个页面的建议（因此时间看起来更像 1-3 秒而不是 10-20 秒）也将受到欢迎。

编辑：附加说明：我依赖 mechanize 的 cookie 处理功能，因此理想情况下，任何答案都将提供一种处理该问题的方法，以及......

编辑：我用不同的配置进行了相同的设置，其中只打开一页，一次将大约 10-20 个内容添加到队列中。那些像刀穿过黄油一样被加工，例如，这是添加一大堆的结尾：

********** Round-trip (with 4 sends/0 loads) for (+0/.0/-0) was total 1.17s (1.14s wait, 0.04s to process) **********
********** Round-trip (with 4 sends/0 loads) for (+0/.0/-0) was total 1.19s (1.16s wait, 0.03s to process) **********
********** Round-trip (with 4 sends/0 loads) for (+0/.0/-0) was total 1.26s (0.80s wait, 0.46s to process) **********
********** Round-trip (with 4 sends/0 loads) for (+0/.0/-0) was total 1.35s (0.77s wait, 0.58s to process) **********
********** Round-trip (with 4 sends/0 loads) for (+2/.4/-0) was total 1.44s (0.24s wait, 1.20s to process) **********

（我添加了wait时间，即信息在发送之前在队列中停留的时间。）请注意，它与to process独立程序一样快。该问题仅体现在不断阅读和解析网页的问题上。（请注意，解析本身会占用大量 CPU）。

编辑：一些初步测试表明我应该为每个网页加载使用单独的过程......一旦启动并运行，将发布更新。

score 1 · Accepted Answer

可能是全局解释器锁 (GIL)。您是否尝试过使用多处理模块（主要是线程的替代品，IIRC）？

另请参阅Python 代码性能随线程而降低

python - python: urlopen&threading 不必要的慢？有更快的方法吗？

1 回答 1

Related

Reference