4

一切都在标题中.. =)

我想快速创建一个powerset,只有长度子集在3(最小恒定长度)和最大值之间。这个最大值在大多数情况下应该是 5,但我希望它是可变的(从 3 到 5,6)。原始集可能很大。我需要存储这些子集以供进一步处理。

我已经看到了在 Java 中获取一个集合的幂集,但是在这里它们生成了幂集的每个子集。对于 C#,我还看到了用于最小长度子集的有效 powerset 算法,但我认为,正如 Adam S 所说,我会遇到低运行时间性能和内存问题。

我想将这些想法组合成一个理想的实现我的目标。我最后的希望是快速生成每个子集(也许使用Guava中的算法)并迭代以仅采用所需的长度......但即使阅读它也很丑;)

有人有想法吗?

4

3 回答 3

4

我从st0le回答中大量借鉴。

基本上,当我遍历控制集合生成的位数组时,我检查以确保位数在最小值和最大值之间。

这是代码。

import java.util.BitSet;
import java.util.Iterator;
import java.util.Set;
import java.util.TreeSet;

public class PowerSet<E> implements Iterator<Set<E>>, Iterable<Set<E>> {
    private int     minSize;
    private int     maxSize;
    private E[]     arr     = null;
    private BitSet  bset    = null;

    @SuppressWarnings("unchecked")
    public PowerSet(Set<E> set, int minSize, int maxSize) {
        this.minSize = Math.min(minSize, set.size());
        this.maxSize = Math.min(maxSize, set.size());

        arr = (E[]) set.toArray();
        bset = new BitSet(arr.length + 1);

        for (int i = 0; i < minSize; i++) {
            bset.set(i);
        }
    }

    @Override
    public boolean hasNext() {
        return !bset.get(arr.length);
    }

    @Override
    public Set<E> next() {
        Set<E> returnSet = new TreeSet<E>();
        // System.out.println(printBitSet());
        for (int i = 0; i < arr.length; i++) {
            if (bset.get(i)) {
                returnSet.add(arr[i]);
            }
        }

        int count;
        do {
            incrementBitSet();
            count = countBitSet();
        } while ((count < minSize) || (count > maxSize));

        // System.out.println(returnSet);
        return returnSet;
    }

    protected void incrementBitSet() {
        for (int i = 0; i < bset.size(); i++) {
            if (!bset.get(i)) {
                bset.set(i);
                break;
            } else
                bset.clear(i);
        }
    }

    protected int countBitSet() {
        int count = 0;
        for (int i = 0; i < bset.size(); i++) {
            if (bset.get(i)) {
                count++;
            }
        }
        return count;

    }

    protected String printBitSet() {
        StringBuilder builder = new StringBuilder();
        for (int i = 0; i < bset.size(); i++) {
            if (bset.get(i)) {
                builder.append('1');
            } else {
                builder.append('0');
            }
        }
        return builder.toString();
    }

    @Override
    public void remove() {
        throw new UnsupportedOperationException("Not Supported!");
    }

    @Override
    public Iterator<Set<E>> iterator() {
        return this;
    }

    public static void main(String[] args) {
        Set<Character> set = new TreeSet<Character>();
        for (int i = 0; i < 7; i++)
            set.add((char) (i + 'A'));

        PowerSet<Character> pset = new PowerSet<Character>(set, 3, 5);
        int count = 1;
        for (Set<Character> s : pset) {
            System.out.println(count++ + ": " + s);
        }
    }

}

以下是测试结果:

1: [A, B, C]
2: [A, B, D]
3: [A, C, D]
4: [B, C, D]
5: [A, B, C, D]
6: [A, B, E]
7: [A, C, E]
8: [B, C, E]
9: [A, B, C, E]
10: [A, D, E]
11: [B, D, E]
12: [A, B, D, E]
13: [C, D, E]
14: [A, C, D, E]
15: [B, C, D, E]
16: [A, B, C, D, E]
17: [A, B, F]
18: [A, C, F]
19: [B, C, F]
20: [A, B, C, F]
21: [A, D, F]
22: [B, D, F]
23: [A, B, D, F]
24: [C, D, F]
25: [A, C, D, F]
26: [B, C, D, F]
27: [A, B, C, D, F]
28: [A, E, F]
29: [B, E, F]
30: [A, B, E, F]
31: [C, E, F]
32: [A, C, E, F]
33: [B, C, E, F]
34: [A, B, C, E, F]
35: [D, E, F]
36: [A, D, E, F]
37: [B, D, E, F]
38: [A, B, D, E, F]
39: [C, D, E, F]
40: [A, C, D, E, F]
41: [B, C, D, E, F]
42: [A, B, G]
43: [A, C, G]
44: [B, C, G]
45: [A, B, C, G]
46: [A, D, G]
47: [B, D, G]
48: [A, B, D, G]
49: [C, D, G]
50: [A, C, D, G]
51: [B, C, D, G]
52: [A, B, C, D, G]
53: [A, E, G]
54: [B, E, G]
55: [A, B, E, G]
56: [C, E, G]
57: [A, C, E, G]
58: [B, C, E, G]
59: [A, B, C, E, G]
60: [D, E, G]
61: [A, D, E, G]
62: [B, D, E, G]
63: [A, B, D, E, G]
64: [C, D, E, G]
65: [A, C, D, E, G]
66: [B, C, D, E, G]
67: [A, F, G]
68: [B, F, G]
69: [A, B, F, G]
70: [C, F, G]
71: [A, C, F, G]
72: [B, C, F, G]
73: [A, B, C, F, G]
74: [D, F, G]
75: [A, D, F, G]
76: [B, D, F, G]
77: [A, B, D, F, G]
78: [C, D, F, G]
79: [A, C, D, F, G]
80: [B, C, D, F, G]
81: [E, F, G]
82: [A, E, F, G]
83: [B, E, F, G]
84: [A, B, E, F, G]
85: [C, E, F, G]
86: [A, C, E, F, G]
87: [B, C, E, F, G]
88: [D, E, F, G]
89: [A, D, E, F, G]
90: [B, D, E, F, G]
91: [C, D, E, F, G]
于 2013-03-18T18:46:21.883 回答
1

请指定元素的数量是否有界。

对于 3 - 5,6 个元素,索引数组可能是可行的short[6]

例如,最多 32 个元素int可以容纳该集合,并且可以:

// dumb:
for (int mask = 0; ; ;) {
    int cardinality = Integer.bitCount(mask);
    if (3 <= cardinality && cardinality <= 6) {
        ...
    }
}

对于超过 64 个元素,BitSet 仍然可以提供紧凑性。

这里的决定受到简单统计的约束:有多少元素等等。对于 int/long/BitSet 解决方案,可以从 BitSet 开始,因为它具有更好的 API。例如,可以通过翻转两位来进入具有相同位数的下一个幂集。如果一个人有数学倾向,De Bruijn 循环可能会很有趣。

于 2013-03-18T17:59:46.963 回答
0

对于大小不超过 63 个元素的集合,并且如果您的最小大小约束不低,则此实现可能更快:

import java.util.Iterator;
import java.util.LinkedList;
import java.util.List;
import java.util.Set;

import static com.google.common.base.Preconditions.checkArgument;
import static java.lang.Math.min;

public class PowerSet<E> implements Iterator<Iterable<E>>, Iterable<Iterable<E>> {

    private int minSize;
    private int maxSize;
    private E[] arr = null;
    private long bits = 0;
    private long count = 0;
    private long minMask = 0;

    /**
     * Build a PowerSet of the given {@link Set} to iterate over subsets whose size is between the given boundaries
     * @param set the set to create subsets from
     * @param minSize the minimal size of the subsets
     * @param maxSize the maximum size of the subsets
     */
    @SuppressWarnings("unchecked")
    public PowerSet(Set<E> set, int minSize, int maxSize) {
        checkArgument(maxSize < 63); // limited to 63 because we need one additional bit for hasNext
        this.minSize = min(minSize, set.size());
        this.maxSize = min(maxSize, set.size());
        arr = (E[]) set.toArray();
        for (int n = 0; n < minSize; n++) {
            bits |= (1L << n);
        }
        count = countBitSet(bits);
    }

    @Override
    public boolean hasNext() {
        return (bits & (1L << arr.length)) == 0;
    }

    @Override
    public Iterable<E> next() {
        // compute current subset
        final List<E> returnSet = new LinkedList<>();
        for (int i = 0; i < arr.length; i++) {
            if ((bits & (1L << i)) != 0) {
                returnSet.add(arr[i]);
            }
        }

        // compute next bit map for next subset
        do {
            if (count < minSize) {
                long maxFree = lowestIndex(bits) - 1;
                long missing = minSize - count;
                for (int n = 0; n < min(maxFree, missing); n++) {
                    bits |= (1L << n);
                }
            } else {
                bits++;
            }
            count = countBitSet(bits);
        } while ((count < minSize) || (count > maxSize));
        return returnSet;
    }

    /**
     * Lowest set bit in a long word
     * @param i the long word
     * @return lowest bit set
     */
    private static long lowestIndex(long i) {
        int n = 0;
        while (n < 64 && (i & 1L) == 0) {
            n++;
            i = i >>> 1;
        }
        return n;
    }

    /**
     * Compute the number of bit sets inside a word or a long word.<br/>
     * <a href="http://en.wikipedia.org/wiki/Hamming_weight">Hamming weight</a>
     * @param i the long word
     * @return number of set bits
     */
    private static long countBitSet(long i) {
        i = i - ((i >>> 1) & 0x5555555555555555L);
        i = (i & 0x3333333333333333L) + ((i >>> 2) & 0x3333333333333333L);
        i = ((i + (i >>> 4)) & 0x0F0F0F0F0F0F0F0FL);
        return (i * (0x0101010101010101L)) >>> 56;
    }

    @Override
    public void remove() {
        throw new UnsupportedOperationException("Not Supported!");
    }

    @Override
    public Iterator<Iterable<E>> iterator() {
        return this;
    }

}

它由 long 而不是 BitSet 支持,并更快地计算下一个子集......对于低的最小约束,它也可能更快。

您可以删除对仅用于构造函数 checkArgument 的 Guava 的依赖...

于 2016-08-21T12:08:40.243 回答