clojure - 以 Newick 格式懒惰地打印一棵树

Question

我希望以Newick 格式打印一棵二叉树，显示每个节点到其父节点的距离。目前我没有遇到以下代码的问题，它使用常规递归，但是树太深可能会产生堆栈溢出。

(defn tree->newick
  [tree]
  (let [{:keys [id children to-parent]} tree
        dist (double to-parent)] ; to-parent may be a rational
    (if children
      (str "(" (tree->newick (first children)) 
           "," (tree->newick (second children)) 
           "):" dist)
      (str (name id) ":" dist))))

(def example {:id nil :to-parent 0.0 
              :children [{:id nil :to-parent 0.5 
                          :children [{:id "A" :to-parent 0.3 :children nil}
                                     {:id "B" :to-parent 0.2 :children nil}]}
                         {:id "C" :to-parent 0.8 :children nil}]})

(tree->newick example)
;=> "((A:0.3,B:0.2):0.5,C:0.8):0.0"

(def linear-tree (->> {:id "bottom" :to-parent 0.1 :children nil}
                   (iterate #(hash-map :id nil :to-parent 0.1 
                                       :children [% {:id "side" :to-parent 0.1 :children nil}]))
                   (take 10000)
                   last))

(tree->newick linear-tree)
;=> StackOverflowError

我在当前实用程序（例如tree-seqand clojure.walk）中发现的问题是我必须多次访问内部节点，以插入逗号并关闭括号。我使用过clojure.zip，但没有设法编写惰性/尾递归实现，因为我需要为每个内部节点存储它们已经被访问过的次数。

score 4 · Accepted Answer

这是适用于您的linear-tree示例的版本。这是您的实现的直接转换，有两个变化：它使用延续传递样式和蹦床。

(defn tree->newick
  ([tree]
     (trampoline tree->newick tree identity))
  ([tree cont]
     (let [{:keys [id children to-parent]} tree
           dist (double to-parent)]     ; to-parent may be a rational
       (if children
         (fn []
           (tree->newick
            (first children)
            (fn [s1] (fn []
                       (tree->newick
                        (second children)
                        (fn [s2] (cont (str "(" s1 "," s2 "):" dist))))))))
         (cont (str (name id) ":" dist))))))

编辑：添加模式匹配以允许以简单的方式调用该函数。

编辑2：我注意到我犯了错误。问题是我确实考虑了 Clojure 没有优化尾调用的事实，只是部分考虑到了这一点。

我的解决方案的核心思想是转换为延续传递样式，因此递归调用可以移动到尾部位置（即递归调用不是返回结果，而是将其作为参数传递给延续）。

然后我通过让它们使用蹦床来手动优化递归调用。我忘记考虑的是延续的调用——不是递归调用，而是在尾部位置——也需要优化，因为尾部调用可以是一个很长的闭包链，所以当函数 finally评估它们，它变成了一个长长的调用链。

这个问题并没有在测试数据linear-tree中实现，因为第一个孩子的延续返回到蹦床来处理第二个孩子的递归调用。但是如果linear-tree更改为使用每个节点的第二个子节点而不是第一个子节点来构建线性树，这会再次导致堆栈溢出。

所以延续的呼唤也需要回到蹦床。（实际上，没有孩子的基本情况下的调用不会，因为它在返回蹦床之前最多会发生一次，第二次递归调用也是如此。）所以这里有一个考虑到这一点的实现并且应该在所有输入上只使用常量堆栈空间：

(defn tree->newick
  ([tree]
     (trampoline tree->newick tree identity))
  ([tree cont]
     (let [{:keys [id children to-parent]} tree
           dist (double to-parent)]     ; to-parent may be a rational
       (if children
         (fn [] (tree->newick
                 (first children)
                 (fn [s1] (tree->newick
                           (second children)
                           (fn [s2] #(cont (str "(" s1 "," s2 "):" dist)))))))
         (cont (str (name id) ":" dist))))))

clojure - 以 Newick 格式懒惰地打印一棵树

1 回答 1

Related

Reference