1

Here are some values. Each is a sequence of ascending (or otherwise grouped) values.

(def input-vals [[[1 :a] [1 :b] [2 :c] [3 :d] [3 :e]]
           [[1 :f] [2 :g] [2 :h] [2 :i] [3 :j] [3 :k]]
           [[1 :l] [3 :m]]])

I can partition them each by value.

=> (map (partial partition-by first) input-vals)
   ((([1 :a] [1 :b]) ([2 :c]) ([3 :d] [3 :e])) (([1 :f]) ([2 :g] [2 :h] [2 :i]) ([3 :j] [3 :k])) (([1 :l]) ([3 :m])))

But that gets me 3 sequences of partitions. I want one single sequence of partitioned groups.

What I want to do is return a single lazy sequence of (potentially) lazy sequences that are the respective partitions joined. e.g. I want to produce this:

((([1 :a] [1 :b] [1 :f] [1 :l]) ([2 :c] [2 :g] [2 :h] [2 :i]) ([3 :d] [3 :e] [3 :j] [3 :k] [3 :m])))

Note that not all values appear in all sequences (there is no 2 in the third vector).

This is of course a simplification of my problem. The real data is a set of lazy streams coming from very large files, so nothing can be realised. But I think the solution for the above question is the solution for my problem.

Feel free to edit the title, I wasn't quite sure how to express it.

4

5 回答 5

2

让我们让它变得有趣,并使用无限长的序列作为我们的输入

(def twos (iterate #(+ 2 %) 0))
(def threes (iterate #(+ 3 %) 0))
(def fives (iterate #(+ 5 %) 0))

我们需要懒惰地合并它们。让我们请求一个比较器,以便我们也可以应用于其他数据类型。

(defn lazy-merge-by
 ([compfn xs ys] 
  (lazy-seq
    (cond
      (empty? xs) ys
      (empty? ys) xs
      :else (if (compfn (first xs) (first ys)) 
              (cons (first xs) (lazy-merge-by compfn (rest xs) ys))
              (cons (first ys) (lazy-merge-by compfn xs (rest ys)))))))
  ([compfn xs ys & more] 
   (apply lazy-merge-by compfn (lazy-merge-by compfn xs ys) more)))

测试

(take 15 (lazy-merge-by < twos threes fives))
;=> (0 0 0 2 3 4 5 6 6 8 9 10 10 12 12)

如果需要,我们可以(懒惰地)按值分区

(take 10 (partition-by identity (lazy-merge-by < twos threes fives)))
;=> ((0 0 0) (2) (3) (4) (5) (6 6) (8) (9) (10 10) (12 12))

现在,回到示例输入

(partition-by first (apply lazy-merge-by #(<= (first %) (first %2)) input-vals))
;=> (([1 :a] [1 :b] [1 :f] [1 :l]) ([2 :c] [2 :g] [2 :h] [2 :i]) ([3 :d] [3 :e] [3 :j] [3 :k] [3 :m]))

根据需要减少一组无关的外括号。

于 2014-01-21T18:38:10.120 回答
2

试试这个恐怖:

(defn partition-many-by [f comp-f s]
  (let [sorted-s (sort-by first comp-f s)
        first-list (first (drop-while (complement seq) sorted-s))
        match-val (f (first first-list))
        remains (filter #(not (empty? %)) 
                        (map #(drop-while (fn [ss] (= match-val (f ss))) %) 
                             sorted-s))]
    (when match-val
      (cons
        (apply concat
          (map #(take-while (fn [ss] (= match-val (f ss))) %)
               sorted-s))
        (lazy-seq (partition-many-by f comp-f remains))))))

可能会改进以删除双重值检查(take-while 和 drop-while)。

示例用法:

(partition-many-by identity [[1 1 1 1 2 2 3 3 3 3] [1 1 2 2 2 2 3] [3]])

=> ((1 1 1 1 1 1) (2 2 2 2 2 2) (3 3 3 3 3 3))
于 2014-01-21T17:41:44.307 回答
1

I'm not sure whether I'm following but you can faltten the result sequence, something like:

(flatten (partition-by identity (first input-vals)))

clojure.core/flatten
([x])
Takes any nested combination of sequential things (lists, vectors,
etc.) and returns their contents as a single, flat sequence.
(flatten nil) returns an empty sequence.

You can use realized? function to test whether a sequence is lazy or not.

于 2014-01-21T16:30:36.880 回答
1
user> (def desired-result '((([1 :a] [1 :b] [1 :f] [1 :l])
                             ([2 :c] [2 :g] [2 :h] [2 :i])
                             ([3 :d] [3 :e] [3 :j] [3 :k] [3 :m]))))
#'user/desired-result

user> (def input-vals [[[1 :a] [1 :b] [2 :c] [3 :d] [3 :e]]
                       [[1 :f] [2 :g] [2 :h] [2 :i] [3 :j] [3 :k]]
                       [[1 :l] [3 :m]]])
#'user/input-vals

user> (= desired-result (vector (vals (group-by first (apply concat input-vals)))))
true

我稍微更改了输入值以纠正我认为是印刷错误的错误,如果不是错误,我可以更新我的代码以适应不太规则的结构。

使用->>(thread last) 宏,我们可以得到更易读的等效代码:

user> (= desired-result
         (->> input-vals
           (apply concat)
           (group-by first)
           vals
           vector))
true
于 2014-01-21T17:11:33.030 回答
0
(partition-by first (sort-by first (mapcat identity input-vals)))
于 2014-01-22T00:11:21.530 回答