concurrency - Clojure 的 pmap 函数为 URL 获取操作生成了多少线程？

Question

该pmap函数的文档让我想知道它对于通过 Web 获取 XML 提要集合之类的事情会有多高效。我不知道 pmap 会产生多少并发获取操作以及最大值是多少。

score 25 · Accepted Answer

如果您检查您看到的来源：

> (use 'clojure.repl)
> (source pmap)
(defn pmap
  "Like map, except f is applied in parallel. Semi-lazy in that the
  parallel computation stays ahead of the consumption, but doesn't
  realize the entire result unless required. Only useful for
  computationally intensive functions where the time of f dominates
  the coordination overhead."
  {:added "1.0"}
  ([f coll]
   (let [n (+ 2 (.. Runtime getRuntime availableProcessors))
         rets (map #(future (f %)) coll)
         step (fn step [[x & xs :as vs] fs]
                (lazy-seq
                 (if-let [s (seq fs)]
                   (cons (deref x) (step xs (rest s)))
                   (map deref vs))))]
     (step rets (drop n rets))))
  ([f coll & colls]
   (let [step (fn step [cs]
                (lazy-seq
                 (let [ss (map seq cs)]
                   (when (every? identity ss)
                     (cons (map first ss) (step (map rest ss)))))))]
     (pmap #(apply f %) (step (cons coll colls))))))

这(+ 2 (.. Runtime getRuntime availableProcessors))是一个很大的线索。pmap 将抓取第一批(+ 2 processors)工作并通过future. 因此，如果您有 2 个内核，它将一次启动 4 个工作，试图保持领先于您，但最大值应为 2+n。

future最终使用支持无限数量的线程的代理 I/O 线程池。它会随着工作的增加而增长，如果线程未使用，它会缩小。

score 12 · Accepted Answer

基于 Alex 解释 pmap 工作原理的出色回答，以下是我对您的情况的建议：

(doall
  (map
    #(future (my-web-fetch-function %))
    list-of-xml-feeds-to-fetch))

理由：

您需要尽可能多的正在进行的工作，因为大多数会阻塞网络 IO。
Future 将为每个请求触发一个异步工作，在线程池中处理。您可以让 Clojure 智能地处理这些问题。
地图上的 doall 将强制评估完整序列（即启动所有请求）。
您的主线程可以立即开始取消对期货的引用，因此可以在个别结果返回时继续取得进展

score 3 · Accepted Answer

没有时间写一个长响应，但是有一个 clojure.contrib http-agent 它将每个 get/post 请求创建为它自己的代理。因此，您可以触发一千个请求，它们将并行运行并在结果出现时完成。

score 2 · Accepted Answer

查看 pmap 的操作，无论您拥有多少个处理器，它似乎一次运行 32 个线程，问题是 map 将领先于计算 32 并且期货是自己开始的。（样本） (defn samplef [n] (println "starting " n) (Thread/sleep 10000) n) (def result (pmap samplef (range 0 100)))

; 您将等待 10 秒并看到 32 个打印件，然后当您拍摄第 33 个时再打印 32 个；打印此分钟，您一次正在执行 32 个并发线程；对我来说这并不完美；萨卢多斯·费利佩

concurrency - Clojure 的 pmap 函数为 URL 获取操作生成了多少线程？

4 回答 4

Related

Reference