java - 高效生成三元组组合

Question

给定一组三元组 S，其中对于每个三元组 s \in S，它认为 s[1] >= s[2] >= s[3]，其中 s[i] 是三元组 s 的第 i 个元素。对于任何 s,t,v \in S，让函数 F(s,t,v) 生成一个新的三元组： F(s,t,v)=(max{s[1],t[1],v[ 1]}，最大{s[2]，t[2]，v[2]}，最大{s[3]，t[3]，v[3]}）。目标：生成集合 T={F(s,t,v) | s,t,v \in S} 有效。

两个例子：

S = [(9,4,3),(8,6,2),(6,6,4)]
T = [(9,4,3),(8,6,2),(6,6,4),(9,6,3),(9,6,4),(8,6,4)]

S = [(9,4,3),(8,6,2),(6,5,4)]
T = [(9,4,3), (9,6,3), b(9,5,4), b(9,6,4), b(8,6,2), b(8,6,4), b(6,5,4)]

下面是一个简单但相对低效的实现，它完成了上述操作。此代码在 O(n^3) 中运行，|S|=n。问题是：如何更有效地实现这一点？这将涉及提出一个有效的数据结构来保存 S 的排序版本。例如，我们可以观察到 F(s,t,v)=s if t[1],v[1] <= s[1 ], t[2],v[2] <= s[2], t[3],v[3] <= s[3]。因此，如果我们选择三元组 s=(x,y,z)，那么我们只需要迭代具有 x' <= x 和 y' >= y 和 z' >= 的三元组 (x',y',z') z。注意：在我的应用程序中 |S| 很大，例如 100000 个三元组。

public class TripleGen {
    public static void main(String[] args) {
        int[][] ds = new int[][]{{9, 4, 3}, {8, 6, 2}, {6, 5, 4}};
        List<Triple> l = Triple.toList(ds);
        System.out.println(gen(l));
    }

    public static Set<Tripple> gen(List<Triple> S) {
        Set<Triple> T = new HashSet<>();
        for (int i = 0; i < S.size(); i++) {
            for (int j = i; j < S.size(); j++) {
                for (int k = j; k < S.size(); k++) {
                    int l = Math.max(S.get(i).x, Math.max(S.get(j).x, S.get(k).x));
                    int w = Math.max(S.get(i).y, Math.max(S.get(j).y, S.get(k).y));
                    int h = Math.max(S.get(i).z, Math.max(S.get(j).z, S.get(k).z));
                    T.add(new Triple(l, w, h));
                }
            }
        }

        return T;
    }
}

public final class Triple {
    public final int x;
    public final int y;
    public final int z;

    public Triple(int x, int y, int z) {
        this.x = x;
        this.y = y;
        this.z = z;
    }

    public static List<Triple> toList(int[][] ds) {
        List<Triple> l = new ArrayList<>(ds.length);
        for (int[] d : ds)
            l.add(new Triple(d[0], d[1], d[2]));
        return l;
    }

    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        Triple t = (Triple) o;
        return x == t.x &&
                y == t.y &&
                z == t.z;
    }

    public int hashCode() {
        return Objects.hash(x, y, z);
    }

    public String toString() {
        return "(" + x + "," + y + "," + z+")";
    }
}

score 1 · Accepted Answer

我怀疑有很多收获。我提出我的尝试。

考虑函数 F2(s, t)，它只对两个三元组进行类似的组合。现在 F(s, t, v) 可以写成 F2(s, F2(t, v))，并且以这种方式计算它可能会有性能增益，将 F2(t, v) 的结果重用于不同的s的。
通过估计结果的容量可能会稍有改进，HashSet因此不需要进行扩展和重新散列。

在代码中：

public static Set<Triple> gen(List<Triple> s) {
    // Deduplicate s
    s = new ArrayList<>(new HashSet<>(s));
    
    int n = s.size();
    
    // Combine pairs of triplets first
    int maxSizeOfT2 = (n * n - 1) / 2;
    int capacityForT2 = (maxSizeOfT2 * 4 + 2) / 3;
    Set<Triple> t2AsSet = new HashSet<>(capacityForT2);
    // For the pairs only pair two *different* triples
    for (int i = 0; i < s.size(); i++) {
        for (int j = i + 1; j < s.size(); j++) {
            Triple newTriplet = f2(s.get(i), s.get(j));
            t2AsSet.add(newTriplet);
        }
    }
    List<Triple> t2 = new ArrayList<>(t2AsSet);
    
    // For the combinations of three original triplets
    // combine every pair with ever original triplet
    int maxSizeOfT = (t2AsSet.size() + 1) * (n + 1) - 1;
    int capacityForT = (maxSizeOfT * 4 + 2) / 3;
    Set<Triple> t = new HashSet<>(capacityForT);
    for (int i = 0; i < t2.size(); i++) {
        for (int j = 0; j < s.size(); j++) {
            Triple newTriplet = f2(t2.get(i), s.get(j));
            t.add(newTriplet);
        }
    }
    
    // Instead of generating F(s, s, s) just add every s to the result
    t.addAll(s);
    
    return t;
}

我没有做任何基准测试，只是一些初步的时间测量。他们没有希望。我正在改变输入中三元组的数量，以及三元组中数字的范围。当只有少量数字时，将过滤掉许多重复项，结果集会更小。数字范围越大，冲突很少发生，结果集的大小也越大。

List  Element     Result   Your time      My time     Improvement
size  range        size   milliseconds  milliseconds      %
-----------------------------------------------------------------
  3   1–9              6       0.038       0.015         60
  3   1–10 000         7       0.046       0.016         66
400   1–9            159    4736        4740              0
400   1–10 000   858 897    1079        1067              1

在评论中，您期望可以改进最好的情况，数字可能表明这是真的。在最坏的情况下，似乎只有边际改善。

正如我在评论中所说，结果集的大小为 O(n^3)，因此生成它的算法不会比 O(n^3) 更快。我们可能希望在 n^3 上有一个较小的常数因子。

java - 高效生成三元组组合

1 回答 1

Related

Reference