clojure - 如何将集合分成按百分比给出的两部分

Question

我有一个我想按任意百分比拆分的集合。我要解决的实际问题是将数据集拆分为训练和交叉验证集。

每个元素的目的地应随机选择，但每个源元素应仅在结果中出现一次，并且分区的大小是固定的。如果源集合具有重复项，则重复项可能出现在不同的输出分区或相同的输出分区中。

我有这个实现：

(defn split-shuffled
  "Returns a 2 element vector partitioned by the percentage 
   specified by p. Elements are selected at random. Each 
   element of the source collection will appear only once in 
   the result."
  [c p]
  (let [m (count c)
        idxs (into #{} (take (* m p) (shuffle (range m))))
        afn (fn [i x] (if (idxs i) x))
        bfn (fn [i x] (if-not (idxs i) x))]
    [(keep-indexed afn c) (keep-indexed bfn c)]))

repl> (split-shuffled (range 10) 0.2)
[(4 6) (0 1 2 3 5 7 8 9)]

repl> (split-shuffled (range 10) 0.4)
[(1 4 6 7) [0 2 3 5 8 9)]

但我不高兴keep-indexed被调用两次。

如何改进？

编辑：我原本想保持分区中的顺序，但我没有重新考虑就放弃了这个要求，所以@mikera 的解决方案是正确的！

score 5 · Accepted Answer

为什么你需要索引？

只需直接洗牌：

(defn split-shuffled
     [c p]
     (let [c (shuffle c)
           m (count c)
           t (* m p)]
       [(take t c) (drop t c)]))

clojure - 如何将集合分成按百分比给出的两部分

1 回答 1

Related

Reference