12

我试图了解如何Spliterator工作,以及如何设计分离器。我认识到这trySplit()可能是 更重要的方法之一Spliterator,但是当我看到一些第三方Spliterator实现时,有时我会看到他们的拆分器trySplit()无条件返回 null。

问题:

  1. Spliterator普通迭代器和无条件返回 null的 a 有区别吗?似乎这样的分裂者破坏了分裂的意义。
  2. 当然,有条件地返回 null on 的拆分器的合法用例trySplit(),但是否存在无条件返回 null 的拆分器的合法用例?
4

3 回答 3

5

正如您所说,Spliterator 优于 Iterator 的主要优势是它的 trySplit() 方法允许并行化,但还有其他显着优势:

http://docs.oracle.com/javase/8/docs/api/java/util/Spliterator.html

Spliterator API 旨在通过支持分解和单元素迭代来支持除了顺序遍历之外的高效并行遍历。此外,通过 Spliterator 访问元素的协议旨在施加比 Iterator 更小的每个元素开销,并避免为 hasNext() 和 next() 使用单独的方法所涉及的固有竞争。

此外,Spliterator 可以使用 StreamSupport.stream 直接转换为 Streams ,以利用 Java8 的流

于 2015-03-05T03:10:18.580 回答
4

One of the purposes of a Spliterator is to be able to split, but that's not the only purpose. The other main purpose is as a support class for creating your own Stream source. One way to create a Stream source is to implement your own Spliterator and pass it to StreamSupport.stream. The simplest thing to do is often to write a Spliterator that can't split. Doing so forces the stream to execute sequentially, but that might be acceptable for whatever you're trying to do.

There are other cases where writing a non-splittable Spliterator makes sense. For example, in OpenJDK, there are implementations such as EmptySpliterator that contain no elements. Of course it can't be split. A similar case is a singleton spliterator that contains exactly one element. It can't be split either. Both implementations return null unconditionally from trySplit.

Another case is where writing a non-splittable Spliterator is easy and effective, and the amount of code necessary to implement a splittable one is prohibitive. (At least, not worth the effort of writing one into a Stack Overflow answer.) For example, see the example Spliterator from this answer. The case here is that the Spliterator implementation wants to wrap another Spliterator and do something special, in this case check to see if it's not empty. Otherwise it just delegates everything to the wrapped Spliterator. Doing this with a non-splittable Spliterator is pretty easy.

Notice that there's discussion in that answer, the comment on that answer, in my answer to the same question, and the comment thread on my answer, about how one would make a splittable (i.e., parallel-ready) Spliterator. But nobody actually wrote out the code to do the splitting. :-) Depending upon how much laziness you want to preserve from the original stream, and how much parallel efficiency you want, writing a splittable Spliterator can get pretty complicated.

In my estimation it's somewhat easier to do this sort of stuff by writing an Iterator instead of a Spliterator (as in my answer noted above). It turns out that Spliterators.spliteratorUnknownSize can provide a limited amount of parallelism, even from an Iterator, which is apparently a purely sequential construct. It does so within IteratorSpliterator, which pulls multiple elements from the Iterator and processes them in batches. Unfortunately the batch size is hardcoded, but at least this gives the opportunity for processing elements pulled from an Iterator in parallel in certain cases.

于 2015-03-05T07:15:07.720 回答
2

除了拆分支持之外,还有更多优势:

  • 迭代逻辑包含在单个tryAdvance方法中,而不是分散在两个方法中,例如hasNext, next。将逻辑拆分为两个方法会使许多Iterator实现复杂化,因为这通常意味着该hasNext方法必须执行实际的查询尝试,该尝试可能会产生一个值,然后必须记住该值以供后续next调用使用。并且必须记住已进行此查询的事实,无论是显式的还是隐式的。

    如果有一个保证hasNext/next总是以典型的交替方式调用,那会更容易,但是,没有这样的保证。

    一个例子是BufferedReader.readLine()它有一个简单的tryAdvance逻辑。包装Iterator必须在实现中调用该方法hasNext并记住next调用的行。(具有讽刺意味的是,当前的BufferedReader.stream()实现确实实现了如此复杂Iterator的将被包装成 aSpliterator而不是直接实现更简单的实现Spliterator。看来“我不熟悉那个”的问题不容小觑)

  • estimateSize(); aSpliterator可能会返回可用于预分配资源的剩余项目的估计(甚至是确切数量)。这可以提高效率。

  • characteristics(); Spliterators 可以提供有关其内容或行为的附加信息。除了判断估计大小是否为精确大小之外,您还可以了解是否可以看到null值、是否存在已定义的遭遇顺序或所有值是否不同。特定的算法可以利用这一点。显然,StreamAPI 是此类算法的构建,可以利用这些算法,因此在计划创建(或支持创建)流并有选择时,实现Spliterator尽可能多的元信息优于实现Iterator将被包装的元信息之后。

于 2015-03-05T10:52:34.047 回答