algorithm - Tortoise and Hare algorithm returning empty result for one case and incorrect for repeated characters

Question

I was going through Wikipedia implementation of Cycle detection using Tortoise-and-Hare algorithm. Using Ruby language, this is what I implemented:

def tortoise_and_hare(sequence)
  tortoise = 1
  hare = 2
  while sequence[tortoise] != sequence[hare]
    tortoise += 1
    hare += 2
  end

  # Find start index of first repetition
  idx = 0
  tortoise = 0
  while sequence[tortoise] != sequence[hare]
    tortoise += 1
    hare += 1
    idx += 1
  end

  # Find length of cycle starting from index idx
  length = 1
  hare = tortoise + 1
  while sequence[tortoise] != sequence[hare]
    hare += 1
    length += 1
  end

  [idx, length]
end

sequence = [2, 0, 6, 3, 1, 6, 3, 1, 6, 3, 1]
idx, length = tortoise_and_hare(sequence)
p sequence[idx, length]

This is working correctly and returns [6, 3, 1]. Now,

If I trim the sequence to [2, 0, 6, 3, 1, 6, 3, 1], it returns empty set.
I can see the problem is in second loop. If cycle has repeating character, algorithm returns incorrect answer. Example, [2, 0, 6, 3, 1, 6, 6, 3, 1, 6, 6, 3, 1, 6] returns [6, 3, 1], but should be [6, 3, 1, 6]. I can see the problem is in third loop.

So I guess my questions are:

Is the algorithm posted on Wikipedia standard?
Is my second case incorrect? I know cycle detection means infinitesimally long sequence which my exam is not, but it still has a cycle.
If the case is correct, what can we do to improve the algorithm and solve the 2 issues I pointed out above?

I tried modifying second loop for first problem (trimming the sequence small enough for the algorithm to fail) and it worked:

  # Find start index of first repetition
  idx = 0
  tortoise = 0
  while sequence[tortoise] != sequence[hare]
    tortoise += 1
    hare += 1
    hare = tortoise if hare > sequence.length - 1
    idx += 1
  end

Does it look wrong or may fail in some case?
What can we do for second problem (repeating characters)?

Although I cam up with another elegant Regex based solution, I still would like to learn more about above algorithm.

Regex solution for curious: /(?<cycle>(\d+\s)+(\d+))\s\k<cycle>/

Edit: I understood why it is impossible for it to detect repeated characters. But is there any other algorithm that may help in this situation?

score 0 · Accepted Answer

答案是你的代码很好，但是你的样本集太小了。该算法并未声称在尽可能短的数据量中找到一个循环。

您链接的页面上的数据集定义定义了生成无限数据集的过程。这些数据最终必须重复，因为您的域是无限的，但您的范围是有限的。

根据范围，该算法将需要更多或更少的数据来确定周期。您只是提供的不够多。

至于我会采用的解决方案？我将通过插入第一个数字来创建范围内每个数字的映射以及我发现它的位置。一旦你找到一个重复，你就找到了你的周期。从一审的位置到二审之前的位置。这给出了线性运行时间和 N*M 内存使用量。N=列表的大小 M=值的范围

这是您想要的 perl（生锈的 perl）正则表达式：

$data = "1 2 3 4 5 3 4 5"; 
if ($data =~ /(?<c>\d).*?(\k<c>)/) {
    print substr($data, $-[1], $-[2]-$-[1])."\n";
} elsif {
    print "NO\n";
}

最坏情况的运行时间是 n^2，我想它只适用于个位数（很容易修复），但它更直接。

algorithm - Tortoise and Hare algorithm returning empty result for one case and incorrect for repeated characters

1 回答 1

Related

Reference