list - Java 8 流与迭代器性能

Question

我正在比较使用和不使用流的两种过滤列表的方法。事实证明，对于 10,000 个项目的列表，不使用流的方法更快。我有兴趣了解为什么会这样。谁能解释一下结果？

public static int countLongWordsWithoutUsingStreams(
        final List<String> words, final int longWordMinLength) {
    words.removeIf(word -> word.length() <= longWordMinLength);

    return words.size();
}

public static int countLongWordsUsingStreams(final List<String> words, final int longWordMinLength) {
    return (int) words.stream().filter(w -> w.length() > longWordMinLength).count();
}

使用 JMH 进行微基准测试：

@Benchmark
@BenchmarkMode(Throughput)
@OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsWithoutUsingStreams() {
    countLongWordsWithoutUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}

@Benchmark
@BenchmarkMode(Throughput)
@OutputTimeUnit(MILLISECONDS)
public void benchmarkCountLongWordsUsingStreams() {
    countLongWordsUsingStreams(nCopies(10000, "IAmALongWord"), 3);
}

public static void main(String[] args) throws RunnerException {
    final Options opts = new OptionsBuilder()
        .include(PracticeQuestionsCh8Benchmark.class.getSimpleName())
        .warmupIterations(5).measurementIterations(5).forks(1).build();

    new Runner(opts).run();
}

java -jar 目标/benchmarks.jar -wi 5 -i 5 -f 1

基准
模式 Cnt 分数错误单位
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsUsingStreams thrpt 5 10.219 ± 0.408 ops/ms
PracticeQuestionsCh8Benchmark.benchmarkCountLongWordsWithoutUsingStreams thrpt 5 910.785 ± 21.215 ops/ms

编辑：（因为有人删除了作为答案发布的更新）

public class PracticeQuestionsCh8Benchmark {
    private static final int NUM_WORDS = 10000;
    private static final int LONG_WORD_MIN_LEN = 10;

    private final List<String> words = makeUpWords();

    public List<String> makeUpWords() {
        List<String> words = new ArrayList<>();
        final Random random = new Random();

        for (int i = 0; i < NUM_WORDS; i++) {
            if (random.nextBoolean()) {
                /*
                 * Do this to avoid string interning. c.f.
                 * http://en.wikipedia.org/wiki/String_interning
                 */
                words.add(String.format("%" + LONG_WORD_MIN_LEN + "s", i));
            } else {
                words.add(String.valueOf(i));
            }
        }

        return words;
    }

    @Benchmark
    @BenchmarkMode(AverageTime)
    @OutputTimeUnit(MILLISECONDS)
    public int benchmarkCountLongWordsWithoutUsingStreams() {
        return countLongWordsWithoutUsingStreams(words, LONG_WORD_MIN_LEN);
    }

    @Benchmark
    @BenchmarkMode(AverageTime)
    @OutputTimeUnit(MILLISECONDS)
    public int benchmarkCountLongWordsUsingStreams() {
        return countLongWordsUsingStreams(words, LONG_WORD_MIN_LEN);
    }
}
public static int countLongWordsWithoutUsingStreams(
    final List<String> words, final int longWordMinLength) {
    final Predicate<String> p = s -> s.length() >= longWordMinLength;

    int count = 0;

    for (String aWord : words) {
        if (p.test(aWord)) {
            ++count;
        }
    }

    return count;
}

public static int countLongWordsUsingStreams(final List<String> words,
    final int longWordMinLength) {
    return (int) words.stream()
    .filter(w -> w.length() >= longWordMinLength).count();
}

score 5 · Accepted Answer

每当您的基准测试表明超过 10000 个元素的某些操作需要 1ns（编辑：1µs）时，您可能会发现一个聪明的 JVM 发现您的代码实际上没有做任何事情的案例。

Collections.nCopies实际上并没有列出 10000 个元素。它制作了一种带有 1 个元素的假列表，并计算了它应该存在多少次。该列表也是不可变的，因此如果有事情要做，您countLongWordsWithoutUsingStreams会抛出异常。removeIf

score 2 · Accepted Answer

您不会从基准测试方法返回任何值，因此，JMH 没有机会逃避计算值，并且您的基准测试会遭受死代码消除。你计算什么都不做需要多长时间。请参阅 JMH 页面以获取更多指导。

这么说，在某些情况下流可能会更慢：Java 8: performance of Streams vs Collections

list - Java 8 流与迭代器性能

2 回答 2

Related

Reference