10
final Multimap<Term, BooleanClause> terms = getTerms(bq);
        for (Term t : terms.keySet()) {
            Collection<BooleanClause> C = new HashSet(terms.get(t));
            if (!C.isEmpty()) {
                for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
                    BooleanClause c = it.next();
                    if(c.isSomething()) C.remove(c);
                }
            }
        }

不是SSCCE,但你能闻到气味吗?

4

3 回答 3

24

The Iterator for the HashSet class is a fail-fast iterator. From the documentation of the HashSet class:

The iterators returned by this class's iterator method are fail-fast: if the set is modified at any time after the iterator is created, in any way except through the iterator's own remove method, the Iterator throws a ConcurrentModificationException. Thus, in the face of concurrent modification, the iterator fails quickly and cleanly, rather than risking arbitrary, non-deterministic behavior at an undetermined time in the future.

Note that the fail-fast behavior of an iterator cannot be guaranteed as it is, generally speaking, impossible to make any hard guarantees in the presence of unsynchronized concurrent modification. Fail-fast iterators throw ConcurrentModificationException on a best-effort basis. Therefore, it would be wrong to write a program that depended on this exception for its correctness: the fail-fast behavior of iterators should be used only to detect bugs.

Note the last sentence - the fact that you are catching a ConcurrentModificationException implies that another thread is modifying the collection. The same Javadoc API page also states:

If multiple threads access a hash set concurrently, and at least one of the threads modifies the set, it must be synchronized externally. This is typically accomplished by synchronizing on some object that naturally encapsulates the set. If no such object exists, the set should be "wrapped" using the Collections.synchronizedSet method. This is best done at creation time, to prevent accidental unsynchronized access to the set:

Set s = Collections.synchronizedSet(new HashSet(...));

I believe the references to the Javadoc are self explanatory in what ought to be done next.

Additionally, in your case, I do not see why you are not using the ImmutableSet, instead of creating a HashSet on the terms object (which could possibly be modified in the interim; I cannot see the implementation of the getTerms method, but I have a hunch that the underlying keyset is being modified). Creating a immutable set will allow the current thread to have it's own defensive copy of the original key-set.

Note, that although a ConcurrentModificationException can be prevented by using a synchronized Set (as noted in the Java API documentation), it is a prerequisite that all threads access the synchronized collection and not the backing collection directly (which might be untrue in your case as the HashSet is probably created in one thread, while the underlying collection for the MultiMap is modified by other threads). The synchronized collection classes actually maintain an internal mutex for threads to acquire access to; since you cannot access the mutex directly from other threads (and it would be quite ridiculous to do so here), you ought to look at using a defensive copy of either the keyset or of the MultiMap itself using the unmodifiableMultimap method of the MultiMaps class (you'll need to return an unmodifiable MultiMap from the getTerms method). You could also investigate the necessity of returning a synchronized MultiMap, but then again, you'll need to ensure that the mutex must be acquired by any thread to protect the underlying collection from concurrent modifications.

Note, I have deliberately omitted mentioning the use of a thread-safe HashSet for the sole reason that I'm unsure of whether concurrent access to the actual collection will be ensured; it most likely will not be the case.


Edit: ConcurrentModificationExceptions thrown on Iterator.next in a single-threaded scenario

This is with respect to the statement: if(c.isSomething()) C.remove(c); that was introduced in the edited question.

Invoking Collection.remove changes the nature of the question, for it now becomes possible to have ConcurrentModificationExceptions thrown even in a single-threaded scenario.

The possibility arises out of the use of the method itself, in conjunction with the use of the Collection's iterator, in this case the variable it that was initialized using the statement : Iterator<BooleanClause> it = C.iterator();.

The Iterator it that iterates over Collection C stores state pertinent to the current state of the Collection. In this particular case (assuming a Sun/Oracle JRE), a KeyIterator (an internal inner class of the HashMap class that is used by the HashSet) is used to iterate through the Collection. A particular characteristic of this Iterator is that it tracks the number of structural modifications performed on the Collection (the HashMap in this case) via it's Iterator.remove method.

When you invoke remove on the Collection directly, and then follow it up with an invocation of Iterator.next, the iterator throws a ConcurrentModificationException, as Iterator.next verifies whether any structural modifications of the Collection have occurred that the Iterator is unaware of. In this case, Collection.remove causes a structural modification, that is tracked by the Collection, but not by the Iterator.

To overcome this part of the problem, you must invoke Iterator.remove and not Collection.remove, for this ensures that the Iterator is now aware of the modification to the Collection. The Iterator in this case, will track the structural modification occurring through the remove method. Your code should therefore look like the following:

final Multimap<Term, BooleanClause> terms = getTerms(bq);
        for (Term t : terms.keySet()) {
            Collection<BooleanClause> C = new HashSet(terms.get(t));
            if (!C.isEmpty()) {
                for (Iterator<BooleanClause> it = C.iterator(); it.hasNext();) {
                    BooleanClause c = it.next();
                    if(c.isSomething()) it.remove(); // <-- invoke remove on the Iterator. Removes the element returned by it.next.
                }
            }
        }
于 2011-08-30T02:03:29.337 回答
8

原因是您试图在迭代器之外修改集合。

这个怎么运作 :

当你创建一个迭代器时,集合独立地为集合和迭代器维护一个 modifyNum 变量。1. 每次对集合和迭代器进行更改时,集合变量都会递增。 2. 迭代器的变量会随着迭代器的每次更改而递增

因此,当您it.remove()通过迭代器调用时,会将修改编号变量的值都增加 1。

但是,当您collection.remove()直接调用集合时,只会增加集合的修改编号变量的值,而不是迭代器的变量。

并且规则是:只要迭代器的修改编号值与原始集合修改编号值不匹配,它就会给出 ConcurrentModificationException。

于 2011-08-30T04:48:34.110 回答
3

Vineet Reynolds 详细解释了集合抛出ConcurrentModificationException(线程安全、并发)的原因。Swagatika 已经非常详细地解释了这种机制的实现细节(集合和迭代器如何保持对修改次数的计数)。

他们的回答很有趣,我投了赞成票。但是,在您的情况下,问题不来自并发(您只有一个线程),并且实现细节虽然很有趣,但不应该在这里考虑。

HashSet您应该只考虑javadoc的这一部分:

此类的迭代器方法返回的迭代器是快速失败的:如果在创建迭代器后的任何时间修改集合,除了通过迭代器自己的 remove 方法之外,迭代器会抛出 ConcurrentModificationException。因此,面对并发修改,迭代器快速而干净地失败,而不是在未来不确定的时间冒任意的、非确定性的行为。

在您的代码中,您使用其迭代器迭代您的 HashSet,但您使用 HashSet 自己的 remove 方法来删​​除元素 ( C.remove(c)),这会导致ConcurrentModificationException. 相反,如 javadoc 中所述,您应该使用Iterator自己的remove()方法,该方法从基础集合中删除当前正在迭代的元素。

代替

                if(c.isSomething()) C.remove(c);

                if(c.isSomething()) it.remove();

如果您想使用更实用的方法,您可以创建一个Predicate并在以下位置使用 Guava 的Iterables.removeIf()方法HashSet

Predicate<BooleanClause> ignoredBooleanClausePredicate = ...;
Multimap<Term, BooleanClause> terms = getTerms(bq);
for (Term term : terms.keySet()) {
    Collection<BooleanClause> booleanClauses = Sets.newHashSet(terms.get(term));
    Iterables.removeIf(booleanClauses, ignoredBooleanClausePredicate);
}

PS:请注意,在这两种情况下,这只会从临时HashSet. Multimap不会被修改。

于 2011-08-30T12:30:03.047 回答