java - 如何改进生成多集组合的算法？

Question

如何优化以下生成有界多重集组合的生成器中的next()和方法？hasNext()（我将其发布到 C++ 和 Java 上，因为代码与 C++ 兼容，并且没有不直接转换为 C++ 的 Java 特定元素。

有问题的算法的特定区域是整个hasNext()方法，可能不必要地复杂，并且行：

if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;

它有一个我认为可以以某种方式删除的 if 语句。我有一个早期版本的算法，它在 return 语句之前有一些回溯，因此有一个更简单的hasNext()测试，但我无法让那个版本工作。

这个算法的背景是很难找到。例如，在 Knuth 7.2.1.3 中，他只是说可以做到（并给出了一个练习来证明该算法是可能的），但没有给出算法。同样，我有六本关于组合学的高级文本（包括 Papadimitriou 和 Kreher/Stimson），但没有一篇给出生成多集组合的算法。Kreher 将其作为“读者练习”。无论如何，如果您可以改进上述算法或提供比我更有效的工作实现的参考，我将不胜感激。请只给出迭代算法（请不要递归）。

/** The iterator returns a 1-based array of integers. When the last combination is reached hasNext() will be false.
  * @param aiItems One-based array containing number of items available for each unique item type where aiItems[0] is the number of item types
  * @param ctSlots  The number of slots into which the items go
  * @return The iterator which generates the 1-based array containing the combinations or null in the event of an error.
  */
public static java.util.Iterator<int[]> combination( final int[] aiItems, final int ctSlots ){ // multiset combination into a limited number of slots
    CombinatoricIterator<int[]> iterator = new CombinatoricIterator<int[]>(){
        int xSlot;
        int xItemType;
        int ctItemType;
        int[] current = new int[ctSlots + 1];
        int[] aiItemsUsed = new int[aiItems[0] + 1];
        { reset(); current[0] = ctSlots; ctItemType = aiItems[0]; }
        public boolean hasNext(){
            int xUseSlot = ctSlots;
            int iCurrentType = ctItemType;
            int ctItemsUsed = 0;
            int ctTotalItemsUsed = 0;
            while( true ){
                int xUsedType = current[xUseSlot];
                if( xUsedType != iCurrentType ) return true;
                ctItemsUsed++;
                ctTotalItemsUsed++;
                if( ctTotalItemsUsed == ctSlots ) return false;
                if( ctItemsUsed == aiItems[xUsedType] ){
                    iCurrentType--;
                    ctItemsUsed = 0;
                }
                xUseSlot--;
            }
        }
        public int[] next(){
            while( true ){
                while( xItemType == ctItemType ){
                    xSlot--;
                    xItemType = current[xSlot];
                }
                xItemType++;
                while( true ){
                    while( aiItemsUsed[xItemType] == aiItems[xItemType] && xItemType != current[xSlot] ){
                        while( xItemType == ctItemType ){
                            xSlot--;
                            xItemType = current[xSlot];
                        }
                        xItemType++;
                    }
                    if( current[xSlot] > 0 ) aiItemsUsed[current[xSlot]]--;
                    current[xSlot] = xItemType;
                    aiItemsUsed[xItemType]++;
                    if( xSlot == ctSlots ){
                        return current;
                    }
                    xSlot++;
                }
            }

        }
        public int[] get(){ return current; }
        public void remove(){}
        public void set( int[] current ){ this.current = current; }
        public void setValues( int[] current ){
            if( this.current == null || this.current.length != current.length ) this.current = new int[current.length];
            System.arraycopy( current, 0, this.current, 0, current.length );
        }
        public void reset(){
            xSlot = 1;
            xItemType = 0;
            Arrays.fill( current, 0 ); current[0] = ctSlots;
            Arrays.fill( aiItemsUsed, 0 ); aiItemsUsed[0] = aiItems[0];
        }
    };
    return iterator;
}

附加信息

到目前为止，一些受访者似乎不理解集合和有界多重集之间的区别。有界多重集具有重复元素。例如 { a, a, b, b, b, c } 是一个有界多重集，在我的算法中将被编码为 { 3, 2, 3, 1 }。请注意，前导“3”是集合中项目类型（唯一项目）的数量。如果您提供算法，则以下测试应产生如下所示的输出。

    private static void combination_multiset_test(){
        int[] aiItems = { 4, 3, 2, 1, 1 };
        int iSlots = 4;
        java.util.Iterator<int[]> iterator = combination( aiItems, iSlots );
        if( iterator == null ){
            System.out.println( "null" );
            System.exit( -1 );
        }
        int xCombination = 0;
        while( iterator.hasNext() ){
            xCombination++;
            int[] combination = iterator.next();
            if( combination == null ){
                System.out.println( "improper termination, no result" );
                System.exit( -1 );
            }
            System.out.println( xCombination + ": " + Arrays.toString( combination ) );
        }
        System.out.println( "complete" );
    }


1: [4, 1, 1, 1, 2]
2: [4, 1, 1, 1, 3]
3: [4, 1, 1, 1, 4]
4: [4, 1, 1, 2, 2]
5: [4, 1, 1, 2, 3]
6: [4, 1, 1, 2, 4]
7: [4, 1, 1, 3, 4]
8: [4, 1, 2, 2, 3]
9: [4, 1, 2, 2, 4]
10: [4, 1, 2, 3, 4]
11: [4, 2, 2, 3, 4]
complete

score 1 · Accepted Answer

我会编写一个简单的辅助类，它可以执行increment,highbit和for_each_bit.

我会先包装一个unsigned int，并将其限制为 32 位，如果我有野心，可能会扩展它std::bitset或 a ——但是通过使用这 3 种方法开始，我可以对其进行测试并使其正常工作。std::vector<uint32_t>

increment很容易，尤其是在裸 32 位 int 上。

highbit返回最高设置位的位位置。

for_each_bit在 C++ 中有这个签名：

template<typename Lambda>
void for_each_bit( my_bignum const& num, Lambda&& func )

并func使用中的每个设置位的索引进行调用num。

这应该最多需要几分钟来写。

扔掉hasNext，遵循迭代器的概念——你有一个begin子集和一个end子集，而end提取的值是无效的。取消引用这些迭代器会产生有问题的子集（或为所述子集产生工厂）。

end现在很容易计算出来——如果highbit>= 集合中的元素数，那么你已经超过了排列的结尾。

begin为 0 或 1，具体取决于您是否希望包含空子集。

next只是增加你的bignum.

生成子集只需调用for_each_bit，并将该项目从您的集合中放入子集中。

接下来，改进increment以允许随机访问，然后您可以实现对子集的并行迭代！

这样就解决了集合问题。要解决多集问题，首先要解决派生集问题（假设每个元素只有 0 或 1 个），然后对其进行迭代。然后，在派生集的每次迭代中，建立std::vector每个元素的最大计数。

然后做这样的事情：

#include <utility>
#include <cstddef>
#include <vector>

using std::size_t;

namespace details {
template<typename Lambda>
  void for_each_multiset_combo_worker( std::vector<size_t> const& counts, Lambda&& lambda, std::vector<size_t>& indexes, std::vector<size_t>& current )
  {
    if (depth >= counts.size()) {
      lambda( current );
      return;
    }
    for (size_t i = 0; i <= counts[depth]; ++i) {
      // Assert: current.size() == depth
      current.push_back(i);
      // Assert: current.back() == i
      // Assert: current.size() == dpeth+1
      for_each_multiset_combo_worker( counts, lambda, depth+1, current );
      // Assert: current.back() == i
      // Assert: current.size() == dpeth+1
      current.pop_back();
      // Assert: current.size() == depth
    }
  }
}
template<typename Lambda>
void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda )
{
  std::vector<size_t> current;
  current.reserve( counts.size() );
  details::for_each_multiset_combo_worker( counts, std::forward<Lambda>(lambda), 0, current );
}
#include <iostream>

int main() {
  std::vector<size_t> multiset = {3, 2, 1, 1};
  size_t counter = 0;
  for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){
    std::cout << counter << ": [";
    for(auto it = counts.begin(); it != counts.end(); ++it) {
      if (it != counts.begin()) {
        std::cout << ", ";
      }
      std::cout << *it;
    }
    std::cout << "]\n";
    ++counter;
  });
}

现场示例：http: //ideone.com/8GN1xx

在这个现场示例中，我跳过了首先进行集合迭代的优化，而是直接迭代多重集。

（限制：不超过size_t每种类型的最大元素，不超过std::vector不同类型元素的最大容量）。

我不需要领先的“多重集中不同元素的数量”，所以我没有使用它。

这是上述递归算法的迭代版本，使用通常的“将隐式递归堆栈转换为显式迭代堆栈”技术：

#include <utility>
#include <cstddef>
#include <vector>

using std::size_t;

template<typename Lambda>
void for_each_multiset_combo( std::vector<size_t> const& counts, Lambda&& lambda )
{
  // below code is easier if I assume counts is non-empty:
  if (counts.empty())
  {
    lambda(counts);
    return;
  }
  // preallocate a buffer big enough to hold the output counts:
  std::vector<size_t> indexes;
  indexes.reserve( counts.size() );
  while(true) {
    // append 0s on the end of indexes if we have room:
    while (indexes.size() < counts.size()) {
      indexes.push_back(0);
    }
    // at this point, we have a unique element.  Pass it to the passed in lambda:
    lambda( indexes );
    // The advancement logic.  Advance the highest index.  If that overflows, pop it and
    // advance the next highest index:
    indexes.back()++;
    while (indexes.back() > counts[indexes.size()-1]) {
      indexes.pop_back();
      // we are done if we have managed to advance every index, and there are none left to advance:
      if (indexes.empty())
        return; // finished
      indexes.back()++;
    }
  }
}
#include <iostream>

int main() {
  std::vector<size_t> multiset = {3, 2, 1, 1};
  size_t counter = 0;
  for_each_multiset_combo( multiset, [&]( std::vector<size_t> const& counts ){
    std::cout << counter << ": [";
    for(auto it = counts.begin(); it != counts.end(); ++it) {
      if (it != counts.begin()) {
        std::cout << ", ";
      }
      std::cout << *it;
    }
    std::cout << "]\n";
    ++counter;
  });
}

http://ideone.com/x2Zp2f

score 1 · Accepted Answer

编辑：根据澄清的问题调整答案

主要思想：同样，可以对生成的选择进行编码，类似于自定义数字系统。可以增加一个计数器并将该计数器解释为一个选择。

但是，由于 selection == 的大小有额外的限制target。实现限制的一种简单方法是只检查结果选择的大小并跳过不满足限制的选择。但这很慢。

所以我所做的只是做一个更聪明的增量，直接跳转到正确大小的选择。

抱歉，代码是用 Python 编写的。但我以与 Java 迭代器接口相当的方式做到了这一点。输入输出格式为：

haves[i] := multiplicity of the i-th item in the collection
target := output collection must have this size

编码：

class Perm(object):
    def __init__(self,items,haves,target):
        assert sum(haves) >= target
        assert all(h > 0 for h in haves)
        self.items = items
        self.haves = haves
        self.target = target
        self.ans = None
        self.stop = False
    def __iter__(self):
        return self
    def reset(self):
        self.ans = [0]*len(self.haves)
        self.__fill(self.target)
        self.stop = False
    def __fill(self,n):
        """fill ans from LSB with n bits"""
        if n <= 0: return
        i = 0
        while n > self.haves[i]:
            assert self.ans[i] == 0
            self.ans[i] = self.haves[i]
            n -= self.haves[i]
            i += 1
        assert self.ans[i] == 0
        self.ans[i] = n
    def __inc(self):
        """increment from LSB, carry when 'target' or 'haves' constrain is broken"""
        # in fact, the 'target' constrain is always broken on the left most non-zero entry
        # find left most non-zero
        i = 0
        while self.ans[i] == 0:
            i += 1
        # set it to zero
        l = self.ans[i]
        self.ans[i] = 0
        # do increment answer, and carry
        while True:
            # increment to the next entry, if possible
            i += 1
            if i >= len(self.ans):
                self.stop = True
                raise StopIteration
            #
            if self.ans[i] == self.haves[i]:
                l += self.ans[i]
                self.ans[i] = 0
            else:
                l -= 1
                self.ans[i] += 1
                break
        return l
    def next(self):
        if self.stop:
            raise StopIteration
        elif self.ans is None:
            self.reset()
        else:
            l = self.__inc()
            self.__fill(l)
        return self.ans

请注意，该items参数并未真正使用。

assert里面是__init__为了澄清我对输入的假设。

assertin the只是__fill在被调用self.ans的上下文中显示一个方便的属性。__fill

这是一个很好的代码测试框架：

test_cases = [([3,2,1], 3),
              ([3,2,1], 5),
              ([3,2,1], 6),
              ([4,3,2,1,1], 4),
              ([1,3,1,2,4], 4),
             ]

P = Perm(None,*test_cases[-1])
for p in P:
    print p
    #raw_input()

输入的示例结果([1,3,1,2,4], 4)：

[1, 3, 0, 0, 0]
[1, 2, 1, 0, 0]
[0, 3, 1, 0, 0]
[1, 2, 0, 1, 0]
[0, 3, 0, 1, 0]
[1, 1, 1, 1, 0]
[0, 2, 1, 1, 0]
[1, 1, 0, 2, 0]
[0, 2, 0, 2, 0]
[1, 0, 1, 2, 0]
[0, 1, 1, 2, 0]
[1, 2, 0, 0, 1]
[0, 3, 0, 0, 1]
[1, 1, 1, 0, 1]
[0, 2, 1, 0, 1]
[1, 1, 0, 1, 1]
[0, 2, 0, 1, 1]
[1, 0, 1, 1, 1]
[0, 1, 1, 1, 1]
[1, 0, 0, 2, 1]
[0, 1, 0, 2, 1]
[0, 0, 1, 2, 1]
[1, 1, 0, 0, 2]
[0, 2, 0, 0, 2]
[1, 0, 1, 0, 2]
[0, 1, 1, 0, 2]
[1, 0, 0, 1, 2]
[0, 1, 0, 1, 2]
[0, 0, 1, 1, 2]
[0, 0, 0, 2, 2]
[1, 0, 0, 0, 3]
[0, 1, 0, 0, 3]
[0, 0, 1, 0, 3]
[0, 0, 0, 1, 3]
[0, 0, 0, 0, 4]

性能每次next()调用需要的O(h)地方h是项目类型的数量（haves列表的大小）。

score 0 · Accepted Answer

本文提供了一种有效的迭代算法，用于生成多集排列（第 8 页）

本文提供了另一种迭代算法，也在第 8 页

java - 如何改进生成多集组合的算法？

3 回答 3

Related

Reference