clojure - 为什么这个 Clojure 代码内存不足？

Question

我有一个两千万行的排序文本文件。它有很多重复的行。我有一些 Clojure 代码可以计算出每个唯一行有多少个实例，即输出类似于：

alpha 20
beta 17
gamma 3
delta 4
...

该代码适用于较小的文件，但在这个较大的文件上，它会耗尽内存。我究竟做错了什么？我假设我在某个地方抓住了头。

(require '[clojure.java.io :as io])

(def bi-grams (line-seq (io/reader "the-big-input-file.txt")))

(defn quick-process [input-list filename]
    (with-open [out (io/writer filename)] ;; e.g. "train/2gram-freq.txt"
        (binding [*out* out]
           (dorun (map (fn [[w v]] (println w "\t" (count v)))
                       (partition-by identity input-list)))

(quick-process bi-grams "output.txt")

score 7 · Accepted Answer

Your bi-grams variable is holding on to the head of the line-seq.

Try (quick-process (line-seq (io/reader "the-big-input-file.txt")) "output.txt").

clojure - 为什么这个 Clojure 代码内存不足？

1 回答 1

Related

Reference