clojure - 我有两个版本的函数来计算前导哈希 (#) 字符，哪个更好？

Question

我写了一段代码来计算一行的前导哈希（#）字符，这很像一个标题行Markdown

### 第一行 -> 返回 3
######## 第二行 -> 返回 6 （只关心前 6 个字符。

版本 1

(defn
  count-leading-hash
  [line]
  (let [cnt (count (take-while #(= % \#) line))]
    (if (> cnt 6) 6 cnt)))

版本 2

(defn
  count-leading-hash
  [line]
  (loop [cnt 0]
    (if (and (= (.charAt line cnt) \#) (< cnt 6))
      (recur (inc cnt))
      cnt)))

我曾经time测量过这两种实现，发现基于的第一个版本take-while比版本 2 快 2 倍。"###### Line one"作为输入，版本 1 耗时0.09 ms，版本 2 耗时约0.19 ms。

问题 1. 是否会recur减慢第二次实施的速度？

问题 2. 版本 1 更接近函数式编程范式，是吗？

问题 3. 你更喜欢哪一个？为什么？（欢迎您编写自己的实现。）

- 更新 -

在阅读了cloujure的文档后，我想出了一个新版本的这个函数，我认为它很清楚。

(defn
  count-leading-hash
  [line]
  (->> line (take 6) (take-while #(= \# %)) count))

score 6 · Accepted Answer

IMO 对小段代码进行时间测量没有用
是的，版本 1 更实用
我更喜欢版本 1，因为它更容易发现错误
我更喜欢版本 1，因为它的代码更少，因此维护成本更低。

我会这样写函数：

(defn count-leading-hash [line]
  (count (take-while #{\#} (take 6 line))))

score 3 · Accepted Answer

不，这是用于调用的反射.charAt。在创建函数之前调用(set! *warn-on-reflection* true)，你会看到警告。
就它使用HOFs 而言，当然可以。
第一个，虽然(if (> cnt 6) 6 cnt)最好写成(min 6 cnt).

score 2 · Accepted Answer

JVM 上的微基准几乎总是具有误导性，除非你真的知道自己在做什么。因此，我不会过分重视您的两种解决方案的相对性能。

第一个解决方案更惯用。只有当 Clojure 代码是唯一合理的替代方案时，您才会真正看到显式循环/递归。在这种情况下，显然有一个合理的替代方案。

另一种选择，如果您对正则表达式感到满意：

(defn count-leading-hash [line]
     (count (or (re-find #"^#{1,6}" line) "")))

score 2 · Accepted Answer

1：不recur。很快。对于您调用的每个函数，VM 都会产生一些开销和“噪音”：例如，REPL 需要解析和评估您的调用，否则可能会发生一些垃圾收集。这就是为什么对如此微小的代码进行基准测试没有任何意义。

与之比较：

(defn
  count-leading-hash
  [line]
  (let [cnt (count (take-while #(= % \#) line))]
    (if (> cnt 6) 6 cnt)))

(defn
  count-leading-hash2
  [line]
  (loop [cnt 0]
    (if (and (= (.charAt line cnt) \#) (< cnt 6))
      (recur (inc cnt))
      cnt)))

(def lines ["### Line one" "######## Line two"])

(time (dorun (repeatedly 10000 #(dorun (map count-leading-hash lines)))))
;; "Elapsed time: 620.628 msecs"
;; => nil
(time (dorun (repeatedly 10000 #(dorun (map count-leading-hash2 lines)))))
;; "Elapsed time: 592.721 msecs"
;; => nil

没有显着差异。

2：在这种情况下使用loop/recur不是惯用的；最好仅在真正需要时使用它，并在可能的情况下使用其他可用功能。有许多对集合/序列进行操作的有用函数；检查ClojureDocs以获取参考和示例。根据我的经验，对函数式编程不熟悉的具有命令式编程技能的人比那些有大量 Clojure 经验的人使用loop/更多；/可能是代码气味。recurlooprecur

3：我更喜欢第一个版本。有很多不同的方法：

;; more expensive, because it iterates n times, where n is the number of #'s
(defn count-leading-hash [line]
  (min 6 (count (take-while #(= \# %) line))))

;; takes only at most 6 characters from line, so less expensive
(defn count-leading-hash [line]
  (count (take-while #(= \# %) (take 6 line))))

;; instead of an anonymous function, you can use `partial`
(defn count-leading-hash [line]
  (count (take-while (partial = \#) (take 6 line))))

编辑： 如何决定何时使用partial与匿名函数？

就性能而言，这并不重要，因为(partial = \#)评估为(fn [& args] (apply = \# args)). #(= \# %)转换为(fn [arg] (= \# arg))。两者非常相似，但partial为您提供了一个接受任意数量参数的函数，因此在您需要它的情况下，这就是要走的路。partial是lambda 演算中的 λ (lambda) 。我想说，使用更容易阅读的东西，或者partial如果你需要一个带有任意数量参数的函数。

clojure - 我有两个版本的函数来计算前导哈希 (#) 字符，哪个更好？

- 更新 -

4 回答 4

Related

Reference