java - 集合中 removeAll 方法的不直观行为

Question

我发现了使用 individual 时的removeAll方法的这种奇怪行为。AbstractSetsComparators

根据比较集合的大小，使用不同的比较器。

它实际上记录在 API 中，但我仍然看不到它背后的原因。

这是代码：

import java.util.Comparator;
import java.util.Set;
import java.util.Stack;
import java.util.TreeSet;

public class Test {
    public static void main(String[] args) {
        // Any comparator. For this example, the length of a string is compared
        Set<String> set = new TreeSet<String>(new Comparator<String>() {
                @Override
                public int compare(String o1, String o2) {
                        return o1.length() - o2.length();
                }
        });

        set.add("a");
        set.add("aa");
        set.add("aaa");
        set.add("aaaa");
        System.out.println(set); // output: [a, aa, aaa, aaaa]

        Stack<String> stack = new Stack<String>();
        stack.push("b");
        stack.push("bb");
        stack.push("bbb");
        stack.push("bbbb");

        set.removeAll(stack); // NO ITEMS ARE REMOVED from the set
        System.out.println(set); // output: [a, aa, aaa, aaaa]

        // Now let's see what happens if I remove an object from the stack
        stack.pop();
        set.removeAll(stack); // ALL ITEMS from the stack are removed from the
                                                        // set
        System.out.println(set); // output: [aaaa]

        /* Reason for this strange behaviour: Depending on the size of the
         * passed Collection, TreeSet uses either the remove() function of
         * itself, or from the Collection object that was passed. While the
         * remove() method of the TreeSet uses the comparator to determine
         * equality, the remove() method of the passed usually determines
         * equality by calling equals() on its objects.
         */
    }
}

这是 JavaDoc。

score 0 · Accepted Answer

如果您问他们为什么选择以这种方式实现它：

这可能是出于性能原因。考虑有 2 个TreeSet，一个包含m元素，另一个包含n元素。现在考虑从一个有m元素的那个中删除所有n元素。如果我们坚持遍历传入的集合并调用 remove, if mis much greater than n，这将比遍历当前集合并检查它是否存在 ( O(m log n) > O(n log m)) 慢得多。比较大小可以防止这种情况发生。

这不是一个完美的系统 - 如果你将 an 传递Stack给 a TreeSet，迭代通过TreeSet渐近总是一个更糟糕的想法，迭代通过Stack( O(m n) > O(m log n))，但它将遵循与上述相同的规则。尽管考虑所有允许类型的组合会有些麻烦。

如果您要问为什么代码会这样做：

这是代码removeAll：

public boolean removeAll(Collection<?> c) {
    boolean modified = false;

    if (size() > c.size()) {
        for (Iterator<?> i = c.iterator(); i.hasNext(); )
            modified |= remove(i.next());
    } else {
        for (Iterator<?> i = iterator(); i.hasNext(); ) {
            if (c.contains(i.next())) {
                i.remove();
                modified = true;
            }
        }
    }
    return modified;
}

因此，当Stack具有比更多或相同数量的元素TreeSet时（在第一种情况下发生），removeAll将遍历TreeSet并删除中包含的每个元素Stack。由于Stack使用默认String比较，没有字符串将匹配，也不会删除任何内容。

当Stack有较少的元素（在第二种情况下发生）时，removeAll将遍历并为每个使用你的元素Stack调用 remove ，从而删除所有匹配长度的元素，只留下长度为 4 的元素，对应于弹出的元素.TreeSetComparator

score 0 · Accepted Answer

您基本上已经创建了未定义的行为，因为您的集合具有不同的相等标准。只有当它们具有相同的集合时，才能以任何方式组合集合。您基本上违反了A.equals(B)必须产生与B.equals(A).

Comparable：强烈建议（尽管不是必需的）自然排序与 equals 一致。之所以如此，是因为没有显式比较器的排序集（和排序映射）在与自然顺序与等于不一致的元素（或键）一起使用时表现“奇怪”。特别是，这样的排序集合（或排序映射）违反了集合（或映射）的一般合同，该合同是根据 equals 方法定义的。

java - 集合中 removeAll 方法的不直观行为

2 回答 2

Related

Reference