algorithm - 理解 Grossman 和 Zeitman 的算法

Question

我已经阅读了由 Grossman & Zeitman 发表的论文An intrinsic iterative computing of Ackermann's function，其中他们提出了一种优化 Ackermann 函数的算法。

我们知道 Ackermann 函数在子序列 A(m,n) 中产生结果

m=0: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17,...
m=1: 2, 3, 4 , 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,...
m=2: 3, 5, 7, 9, 11, 13, 15, 17, 19, 21, 23, 25, 27, 29, 31, 33,...
m=3: 5, 13, 29, 61, 125, 253, 509, 1021, 2045, 4093, 8189, 16381, 32765 , 65533,...
m=4: 13, 65533,...

据说Next数组是为了跟踪我们在每个子序列中的位置，并且Goal数组是在将刚刚计算的值传输到下一个子序列之前跟踪我们需要到达的位置。它所做的只是将值加 1：

Ackermann(m, n)
    {next and goal are arrays indexed from 0 to m, initialized so that next[0] 
     through next[m] are 0, goal[0] through goal[m - l] are 1, and goal[m] is -1} 
    repeat
        value ← next[0] + 1 
        transferring ← true 
        current ← 0 
        while transferring do begin
           if next[current] = goal[current] then goal[current] ← value
                                            else transferring ← false
           next[current] ← next[current] + 1
           current ← current + 1 
           end while
    until next[m] = n + 1 
    return value {the value of A(m, n)}
    end Ackermann

我发现很难理解这两个数组如何指示我们在哪里/在哪里移动？当我们停在时，我很难确定它的确切含义next[m] = n + 1，为什么特别是这个值？我已经尝试过跟踪算法，但我仍然对它的工作原理感到迷茫。这个算法会算作自下而上的实现吗？

这是一个打印值、当前和两个数组的 java 代码

import java.util.Arrays;

public class Ack {
    static int arrayAck(int m, int n) {
        //Next array to keep track of where we are in each subsequence, 
        //and Goal array to keep track of where we need to reach before transferring the value just calculated to the next subsequence.
        int[] next = new int[m+1];
        int[] goal = new int [m+1];
        Arrays.fill(next, 0);
        Arrays.fill(goal, 1);
        goal[m] = -1;
        
        int value = next[0] + 1;
        while(true) {
            value = next[0] + 1;
            boolean transferring = true;
            int current = 0;
            
            System.out.println("--");
            System.out.println("Next = " + Arrays.toString(next));
            System.out.println("Goal = " + Arrays.toString(goal));
            System.out.println("Current= " + current);
            System.out.println("Value = " + value);
            while(transferring) {
                if(next[current] == goal[current])
                    goal[current] = value;
                else
                    transferring = false;
                next[current]=next[current]+1;
                current += 1;
                System.out.println("Current= " + current);
                System.out.println("Next = " + Arrays.toString(next));
                System.out.println("Goal = " + Arrays.toString(goal));
            }
            
            if(next[m]==n+1)
                break;
            
        }
        
        return value;
    }
     
    public static void main(String[] args) {
        int m=2,n=2;
        System.out.println("Ackermann value for ("+m+","+n+") = " + arrayAck(m, n));
    }

}

score 2 · Accepted Answer

我已经阅读了这篇论文，以了解他们提出的算法，当你掌握了它时，它实际上并没有那么糟糕。

基本思路如下。我们有一个 Ackermann 函数的两个参数版本，定义如下：

A(0, n) = n + 1
A(i, 0) = A(i - 1, 1)
A(i, n) = A(i - 1, A(i, n - 1))

作者建议的方法本质上是对阿克曼函数进行空间优化、自下而上的计算。与其给出他们算法的最终版本，不如看看我们是否可以从上述规则中推导出来。我们将想象填充一个二维表，其中每一行对应一个不同的 i 值，每一列对应一个 n 值。按照论文中的约定，我们将 i = 0 的行放在顶部，如下所示：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0 
i=1
i=2
i=3

我们的目标是填写这张表。现在，不用担心桌子的大小；我们可以稍后确定。

阿克曼函数的三种情况中的第一种说

规则 1： A(0, n) = n + 1。

我们可以将其解释为

表的第一行是值 1、2、3、4、5、...的序列

我们现在可以填写，但出于稍后会更清楚的原因，现在让我们暂缓这样做。

下一个规则是

规则 2： A(i, 0) = A(i - 1, 1)。

我们可以将其解释为

表的每个连续行中的第一个条目是其上方行的第二个（索引 1）条目。

记住这两个规则，让我们开始填写表格条目。我们可以使用第一条规则填充 A(0, 0) 的槽：A(0, 0) = 0 + 1 = 1。

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1
i=1
i=2
i=3

现在，让我们填写 A(0, 1) = 1 + 1 = 2：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2
i=1
i=2
i=3

因为我们刚刚填写了这个表条目，我们可以使用下一条规则来填写 A(1, 0) 的槽，由 A(0, 1) = 2 给出：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2
i=1  2
i=2
i=3

为了在 i = 1 的行中取得更多进展，我们需要提取 Ackermann 函数的第三种情况：

规则 3： A(i, n) = A(i - 1, A(i, n - 1))。

这条规则很密集，但它有一个非常好的直觉。要填写表格槽 A(i, n)，请执行以下操作：

确定表槽 A(i, n - 1) 中有什么。将此数字解释为索引 k。
复制您上方行中的第 k 个项目（即位置 A(i - 1, k) = A(i - 1, A(i, n - 1))）。

在我们的例子中，假设我们要填写 A(1, 1)。使用上述过程，我们执行以下操作：

查看表槽 A(1, 0) = 2。
从第 0 行复制索引 2（即 A(0, 2)）。

但是，查看我们的表格，我们可以看到我们尚未计算 A(0, 2)。所以让我们这样做。我们通过规则 1 计算 A(0, 2) = 3：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3
i=1  2
i=2
i=3

这反过来，根据规则 3，意味着 A(1, 1) = 3：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3
i=1  2    3
i=2
i=3

现在发生了一些有趣的事情。我们刚刚填写了 A(1, 1)，根据规则 2，我们可以通过复制 A(1, 1) 来填写 A(2, 0)：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3
i=1  2    3
i=2  3
i=3

让我们盘点一下刚刚发生的事情。

我们通过规则 1 计算了 A(0, 2)。
计算第 0 行的另一个条目通过规则 3 为我们提供了第 1 行的新条目。
计算第 1 行的另一个条目通过规则 2 为我们提供了第 2 行的新条目。

换句话说，通过计算第 0 行中的另一个值，我们产生了向下传播的涟漪效应，允许我们在第 1 行、第 2 行等中计算更多条目。因此，建议采取一种方法：在第 0 行中生成另一个值，使用规则 1 很容易做到这一点，然后看看我们是否可以使用规则 2 和规则 3 在表格中“涟漪”更高的值。

要查看实际情况，让我们使用规则 1 填写 A(0, 3) = 3+1 = 4：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4
i=1  2    3
i=2  3
i=3

现在，看看第 1 行。规则 3 说要填写 A(1, 3)，我们需要首先计算 A(1, 2)（完成 - 它是 3），然后取第 0 行的第 3 列。我们刚刚计算第 0 行的第 3 列，这意味着我们应该将 A(0, 3) 复制到 A(1, 2) 中：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4
i=1  2    3    4
i=2  3
i=3

现在，看第 2 行。我们已经填写了 A(2, 0)，所以这里要计算的下一个是 A(2, 1)。为了计算 A(2, 1)，我们首先计算 A(2, 0)（完成 - 它是 3），所以我们需要从第 1 行的第 3 列获取条目。不过，我们还没有计算过，所以没有更多的事情发生。涟漪到此停止。

让我们计算第 0 行的下一个条目：A(0, 4) = 4+1 = 5。这里是：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5
i=1  2    3    4
i=2  3
i=3

查看第 1 行，要填写 A(1, 3)，我们查看 A(1, 2)（即 4）并将 A(0, 4) 复制过来：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5
i=1  2    3    4    5
i=2  3
i=3

查看第 2 行，要填写 A(2, 2)，我们查看 A(2, 1)（即 3）并将 A(1, 3) 复制过来：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5
i=1  2    3    4    5
i=2  3    5
i=3

现在，看第 3 行。规则 2 说 A(3, 0) = A(2, 1)，所以我们将 A(2, 1) 复制过来：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5
i=1  2    3    4    5
i=2  3    5
i=3  5

波纹是如何完成的，因为那是我们表的所有行。

为了了解这个系统是如何演变的，让我们一个接一个地运行一些涟漪：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6
i=1  2    3    4    5    6
i=2  3    5
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7
i=1  2    3    4    5    6    7
i=2  3    5    7
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8
i=1  2    3    4    5    6    7    8
i=2  3    5    7
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9
i=1  2    3    4    5    6    7    8    9
i=2  3    5    7    9
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9    10
i=1  2    3    4    5    6    7    8    9    10
i=2  3    5    7    9
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9    10   11
i=1  2    3    4    5    6    7    8    9    10   11
i=2  3    5    7    9   11
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9    10   11   12
i=1  2    3    4    5    6    7    8    9    10   11   12
i=2  3    5    7    9   11
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1    2    3    4    5    6    7    8    9    10   11   12   13
i=1  2    3    4    5    6    7    8    9    10   11   12   13
i=2  3    5    7    9   11   13
i=3  5   13

您可以对照论文表格中的图 1 检查这些值，您会发现这是正在发生的事情的（部分）匹配。

执行该算法的一个重要细节是，在每个时间点，我们只关心每一行中的最后一项。该项目告诉我们在复制下一个值之前需要在我们下面的行中填写什么索引。考虑到这一点，我们可以想象再次运行这个算法，但只存储每行中的最后一项。这可能看起来像这样：

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0  1
i=1
i=2
i=3

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0       2
i=1  2
i=2
i=3

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0            3
i=1       3
i=2  3
i=3

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                 4
i=1            4
i=2  3
i=3

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                      5
i=1                 5
i=2       5
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                           6
i=1                      6
i=2       5
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                7
i=1                           7
i=2            7
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                     8
i=1                                8
i=2            7
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                          9
i=1                                     9
i=2                 9
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                               10
i=1                                          10
i=2                 9
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                                    11
i=1                                               11
i=2                      11
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                                         12
i=1                                                    12
i=2                      11
i=3  5

    n=0  n=1  n=2  n=3  n=4  n=5  n=6  n=7  n=8  n=9  n=10 n=11 n=12
i=0                                                              13
i=1                                                         13
i=2                           13
i=3       13

这是一个很大的空间节省，它暴露了每一行的一些有趣的东西。对于每一行，我们只需要存储两个值：

存储在该行中的数字。
对应于最右边数字所在的列索引。

这两个值的意义是什么？第一个数字可以用两种方式解释。首先，它是阿克曼序列的实际值。其次，对于第一行上方的行，它是一个指示符，表示“我们需要在我下方的行中处于什么位置才能让我前进到我所在行的下一列？” 这是纸张存储在goal数组中的值。

反过来，这些数字中的第二个可以被认为是在说“我在什么位置？”然后可以使用哪些更高级别的行来确定何时提升它们的goal值。这是纸张存储在next数组中的值。

我不会遍历论文中的伪代码，而是从头开始重写算法，最终得到与论文相似但不完全相同的东西。其基本思想是模拟上述过程。我们跟踪我们在表的每一行中所在的列（我称它为positions使其含义更清楚）以及表的每一行中存储的值是什么（我称它为 this values）。

为了简化处理某些行有值而其他行没有值的情况（例如，参见上面跟踪中的前几个步骤），并将规则 2 和规则 3 统一在一起，我采用了跟随策略。我们将扩展 Ackermann 函数，以便通过以下方式定义 A(i, -1) 的值：

A(0, -1) = 0
A(i, -1) = 1

然后，我们假设每一行都从位置 -1 开始，意思是“就在第一列之前”。直觉是第 1 行、第 2 行、第 3 行等都在等待一个项目出现在它们上方行的位置 1 中，而在第 0 行中，第一个之前的值应该是 0，因此第 0 行中产生的值是系列 0, 1, 2, 3, ... 。

考虑到这一点，这是我的实现：

/**
 * File: Ackermann.java
 * Author: Keith Schwarz (htiek@cs.stanford.edu)
 *
 * An implementation of a variation of Grossman and Zeitman's fast algorithm
 * for computing the Ackermann function. This implementation uses O(i) space
 * to compute A(i, n), and takes O(A(i, n)) time. The only operation required
 * on integers is "increment an integer" and "copy an integer." While this is
 * implemented using vanilla ints, it could easily be extended to use BigInteger
 * types or the like.
 */
public class Ackermann {
    public static int ackermann(int i, int n) {
        /* Bounds-check. */
        if (i < 0 || n < 0) throw new RuntimeException("Invalid arguments.");
    
        /* Positions array stores what column we're in in each row of
         * the table. Initially this is all -1 to signify that we're just
         * before the first column.
         */
        int[] positions = new int[i + 1];
        for (int j = 0; j < positions.length; j++) {
            positions[j] = -1;
        }
        
        /* Values array stores the value currently in the filled column
         * in each row. The value in the zeroth row is set to zero because
         * our algorithm works by repeatedly incrementing its value and
         * we need it to be the case that A(0, 0) = 1. Since we start off
         * with "A(0, -1)" computed, we place a zero there.
         *
         * All the other values are set to 1. This corresponds to the rule
         * that A(i, 0) = A(i - 1, 1), meaning we're waiting for column 1
         * to come up in the level below us.
         */
        int[] values = new int[i + 1];
        for (int j = 1; j < values.length; j++) {
            values[j] = 1;
        }
        
        /* Run the algorithm until we compute A(i, n), which happens when
         * positions[i] == n.
         */
        while (positions[i] != n) {        
            /* Push the value in the bottom row forward and increment it to simulate
             * computing the next value of A(0, *).
             */
            values[0]++;
            positions[0]++;
            
            /* Now, do the ripple. We continue rippling upward while
             *
             * 1. There's a layer above us to ripple to, and
             * 2. The value above us equals our current position.
             */
            for (int j = 1; j <= i && positions[j - 1] == values[j]; j++) {
                values[j] = values[j - 1]; // Copy the value
                positions[j]++;            // Shift forward a column
            }
        }
        
        return values[i];
    }

    public static void main(String[] args) {
        if (args.length != 2) {
            System.err.println("Usage: java Ackermann i n");
            return;
        }
        
        int i = Integer.parseInt(args[0]);
        int n = Integer.parseInt(args[1]);
        
        System.out.println("A(" + i + ", " + n + ") = " + ackermann(i, n));
    }
}

最后一个细节 - 我相信 Zeitman 和 Grossman 对 O(iA(i, n)) 的原始运行时分析是正确的，但并不严格。具体来说，这假设在算法的每一步中，我们将一个值从一层复制到下一层。然而，事实并非如此。每行中的值增长得如此之快，以至于在大多数步骤中，我们不会将任何内容复制到表的任何高行。更具体地说，请注意表的第 2 行由每隔一个奇数组成，因此只有一半的增量步骤会在那里复制一些东西，而上面的行肯定不会复制它们下面层的一半以上的值。这意味着，平均而言，每个涟漪将有 100% 的机会复制到第 1 行，最多有 50% 的机会复制到第 2 行，最多有 25% 的机会复制到第 3 行，等等。将复制的数量相加并使用此处（至少）几何衰减的事实意味着“平均”操作仅复制到 O(1) 总行。这意味着我们正在执行 O(A(i, n)) 增量步骤，每个步骤（平均）仅复制到 O(1) 总行，所以完成的总工作量是O(A(i, n))。这并不是对运行时界限的巨大改进（A(i, n) 项是巨大的！），但它是一个稍微好一点的结果。

algorithm - 理解 Grossman 和 Zeitman 的算法

1 回答 1

Related

Reference