scala - Tarjan的强连通分量算法的功能实现

Question

我继续在 Scala 中实现了Tarjan 的 SCC 算法的教科书版本。但是，我不喜欢这段代码——它是非常命令式/程序化的，有很多变异状态和簿记索引。该算法是否有更“功能”的版本？我相信命令式算法版本隐藏了算法背后的核心思想，与函数式版本不同。我发现其他人在使用这个特定算法时遇到了同样的问题，但我无法将他的 Clojure 代码翻译成惯用的 Scala。

注意：如果有人想尝试，我有一个很好的设置，可以生成随机图并测试你的 SCC 算法与运行 Floyd-Warshall

score 11 · Accepted Answer

请参阅David King 和 John Launchbury的 Haskell 中的惰性深度优先搜索和线性图算法。它以函数式风格描述了许多图形算法，包括 SCC。

score 9 · Accepted Answer

以下功能性 Scala 代码生成一个映射，该映射为图形的每个节点分配一个代表。每个代表标识一个强连通分量。该代码基于 Tarjan 的强连接组件算法。

为了理解算法，理解dfs函数的折叠和契约就足够了。

def scc[T](graph:Map[T,Set[T]]): Map[T,T] = {
  //`dfs` finds all strongly connected components below `node`
  //`path` holds the the depth for all nodes above the current one
  //'sccs' holds the representatives found so far; the accumulator
  def dfs(node: T, path: Map[T,Int], sccs: Map[T,T]): Map[T,T] = {
    //returns the earliest encountered node of both arguments
    //for the case both aren't on the path, `old` is returned
    def shallowerNode(old: T,candidate: T): T = 
      (path.get(old),path.get(candidate)) match {
        case (_,None) => old
        case (None,_) => candidate
        case (Some(dOld),Some(dCand)) =>  if(dCand < dOld) candidate else old
      }

    //handle the child nodes
    val children: Set[T] = graph(node)
    //the initially known shallowest back-link is `node` itself
    val (newState,shallowestBackNode) = children.foldLeft((sccs,node)){
      case ((foldedSCCs,shallowest),child) =>
        if(path.contains(child))
          (foldedSCCs, shallowerNode(shallowest,child))
        else {
          val sccWithChildData = dfs(child,path + (node -> path.size),foldedSCCs)
          val shallowestForChild = sccWithChildData(child)
          (sccWithChildData, shallowerNode(shallowest, shallowestForChild))
        }
    }

    newState + (node -> shallowestBackNode)
  }

  //run the above function, so every node gets visited
  graph.keys.foldLeft(Map[T,T]()){ case (sccs,nextNode) =>
    if(sccs.contains(nextNode))
      sccs
    else
      dfs(nextNode,Map(),sccs)
  }
}

我只在维基百科页面上的示例图上测试了代码。

与命令式版本的区别

与原始实现相比，我的版本避免显式展开堆栈，而只是使用适当的（非尾）递归函数。堆栈由一个名为的持久映射表示path。在我的第一个版本中，我使用了Listas 堆栈；但这效率较低，因为必须搜索包含元素。

效率

该代码相当有效。对于每条边，您必须更新和/或访问不可变映射path，总成本O(log|N|)为O(|E| log|N|). 这与O(|E|)命令式版本实现的相反。

线性时间实现

Chris Okasaki 的答案中的论文在 Haskell 中提供了一个线性时间解决方案，用于查找强连接组件。他们的实现是基于 Kosaraju 的用于寻找 SCC 的算法，该算法基本上需要两次深度优先遍历。这篇论文的主要贡献似乎是在 Haskell 中实现了一个惰性的线性时间 DFS。

他们实现线性时间解决方案所需的是拥有一组带有O(1)单例添加和成员资格测试的集合。这基本上与使此答案中给出的解决方案具有比命令式解决方案更高的复杂性的问题相同。他们使用 Haskell 中的状态线程来解决它，这也可以在 Scala 中完成（参见 Scalaz）。因此，如果愿意让代码变得相当复杂，可以将 Tarjan 的 SCC 算法实现为功能O(|E|)版本。

score 0 · Accepted Answer

看看https://github.com/jordanlewis/data.union-find，该算法的 Clojure 实现。它有点伪装成数据结构，但算法就在那里。当然，它纯粹是功能性的。

scala - Tarjan的强连通分量算法的功能实现

3 回答 3

与命令式版本的区别

效率

线性时间实现

Related

Reference