algorithm - 算法在不改变播放的情况下最小化播放列表

Question

我正在寻找一种算法来减少有序但非唯一项目的列表（播放列表）。搜索集合论，但还没有找到合适的东西

例子

[a, b, b, c] -> [a, b, b, c] Cannot be reduced. 
[a, b, b, a, b, b] -> [a, b, b]. 
[b, b, b, b, b] -> [b]. 
[b, b, b, b, a] -> [b, b, b, b, a] Cannot be reduced.

考虑获取所有现有子列表并计算每个实例。如果存在这样的子列表，其计数乘以子列表长度等于原始列表，则取符合此条件的最短子列表。

这似乎有点蛮力，必须有一个更简单/更快的解决方案可用。

score 3 · Accepted Answer

For starters, you don't need to check all sublists -- just those with lengths that are factors of the length of the full list.

If your main concern is coding simplicity rather than raw speed, just let a regex engine solve the problem:

/^(.+?)\1+$/

Which is a variant on Abigail's awesome Perl regex to find prime numbers.

score 2 · Accepted Answer

For each n <= N (where N is the length of the list), if n is a factor of N. If it is, then check if repeating the sublist of the first n characters generates the original list. If it does, then you've found a potential answer (the answer is the shortest). This should get you down to less than O(N^2) but still the same order of efficiency as brute force in the worst case.

You can do some pruning by noting that, if for example a sublist of length 2 successfully generates the first 4 characters but not the full list, then a sublist of length 4 will fail. You can keep a list of all such sublist lengths to not check and this will cut down on some computation.

score 0 · Accepted Answer

用质数编码每个集合元素。

前任：

a -> 2
b -> 3
c -> 5

等等

现在，您还需要维护两个列表。

第一个列表是素数，第二个是它们的指数。

这个想法是；当你偶然发现一个元素时，记录它的质数以及它连续出现的次数。

对于[a, b, b, c]，你得到这个：

[2, 3, 3, 5]

可以记为：

[2, 3^2, 5]

或者，更准确地说：

[2^1, 3^2, 5^1]

并且您维护两个列表：

[2,3,5]  // primes in succession - list [p]
[1,2,1]  // exponents - list [e]

现在，通过检查第一个元素 [p]^[e] 是否与最后一个元素相同，从头到尾遍历这两个列表；如果是，那么倒数第二个等等......如果它们都相同，则可以减少您的列表。

在这个例子中，你检查 if 2^1*5^1 == 3^2*3^2; 既然不是，就不能减少。

让我们试试[a, b, b, a, b, b]：

这被编码为

[2^1, 3^2, 2^1, 3^2]

或者，

[2, 3, 2, 3]  // primes
[1, 2, 1, 2]  // exponents

现在，我们检查是否2^1 * 3^2 == 3^2 * 2^1（第一个素数，第一个指数乘以最后一个素数，最后一个指数，然后与倒数第二个进行比较）

由于这成立，它是可约的。

让我们试试[b, b, b, b, b]：

这可以编码为

[3^5]

或者，

[3]  // primes
[5]  // exponents

这是一个特殊情况：如果您有 1 个元素列表，那么您的原始列表是可简化的。

让我们试试[b, b, b, b, a]：

这可以编码为

[3^4, 2^1]

或者，

[3, 2]  // primes
[4, 1]  // exponents

我们检查是否3^4 == 2^1，因为不是，所以您的列表不可约。

让我们试试[a, b, a, b, a, b]：

这可以编码为

[2^1, 3^1, 2^1, 3^1, 2^1, 3^1]

或者，

[2, 3, 2, 3, 2, 3]
[1, 1, 1, 1, 1, 1]

尝试上述过程有效，因为2^1 * 3^1 == 3^1 * 2^1 == 2^1 * 3^1

所以，算法应该是这样的：

将所有数字编码为素数。

遍历您的列表，制作两个列表并按照描述填充它们

现在您有了两个列表，p并且e，它们都具有长度n，请执行以下操作：

var start = p[0]^e[0] * p[n-1]^e[n-1]
var reducible = true;

for (int i = 0; i < n/2, ++i) :
   if ( (p[i]^e[i] * p[n-i]^e[n-i]) != start ) :
       reducible = false;
       break;

注意：我并没有真正编写此算法并尝试各种输入。这只是一个想法。此外，如果一个列表是可简化的，从它的长度和长度来看n，应该不难看出如何将原始列表简化为其基本形式。

第二个说明：如果有人在上面看到错误，请纠正我。可能这一切都不起作用，因为时间已晚，我的注意力也不是最佳的。

score 0 · Accepted Answer

这是一些应该在接近线性时间内运行的简单代码（我认为最坏的情况是 O(n lg lg n)，依赖于一些更高的数学）。

f(x) {
  i = 1;
  while (i <= size(x) / 2) {
    if (size(x) % i != 0) { i++; continue;}
    b = true;
    for (j = 0; j + i < x.size(); j++) {
      if (x[i] != x[j]) {
        b = false;
        break;
      }
    }
    if (b) return i;
    i = max(i + 1, j / i * i / 2); // skip some values of i if j is large enough
  }
  return -1;
}

本质上，上面执行的是朴素算法，但跳过了一些由于早期的“近乎未命中”而已知不可能的周期性。例如，如果您尝试 5 的周期并看到“aaaabaaaabaaaabaaaabab”，您可以安全地跳过 6、7、...、10，因为我们看到了 4 个周期，其中 5 重复，然后失败。

最终，您最终会做线性工作量加上 sigma(n) 中的线性工作量，即 n 的除数之和，以 O(n lg lg n) 为界。

*请注意，证明此跳过的正确性非常微妙，我可能在细节上犯了错误——欢迎评论。

algorithm - 算法在不改变播放的情况下最小化播放列表

4 回答 4

Related

Reference