java - 随机数据与 JMH Java 微基准测试浮点打印

Question

我正在为我编写的浮点打印代码编写 JMH 微基准测试。我还不太关心确切的性能，但要让基准代码正确。

我想循环一些随机生成的数据，所以我制作了一些静态数据数组，并使我的循环机制（增量和掩码）尽可能简单。这是正确的方法还是我应该告诉 JMH 更多关于我缺少的一些注释的情况？

此外，是否可以为测试创建显示组而不仅仅是字典顺序？我基本上有两组测试（每组随机数据一组。

完整来源在https://github.com/jnordwick/zerog-grisu

这是基准代码：

package zerog.util.grisu;

import java.util.Random;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.RunnerException;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

/* 
 * Current JMH bench, similar on small numbers (no fast path code yet)
 * and 40% faster on completely random numbers.
 * 
 * Benchmark                         Mode  Cnt         Score         Error  Units
 * JmhBenchmark.test_lowp_doubleto  thrpt   20  11439027.798 ± 2677191.952  ops/s
 * JmhBenchmark.test_lowp_grisubuf  thrpt   20  11540289.271 ±  237842.768  ops/s
 * JmhBenchmark.test_lowp_grisustr  thrpt   20   5038077.637 ±  754272.267  ops/s
 * 
 * JmhBenchmark.test_rand_doubleto  thrpt   20   1841031.602 ±  219147.330  ops/s
 * JmhBenchmark.test_rand_grisubuf  thrpt   20   2609354.822 ±   57551.153  ops/s
 * JmhBenchmark.test_rand_grisustr  thrpt   20   2078684.828 ±  298474.218  ops/s
 * 
 * This doens't account for any garbage costs either since the benchmarks
 * aren't generating enough to trigger GC, and Java internally uses per-thread
 * objects to avoid some allocations.
 * 
 * Don't call Grisu.doubleToString() except for testing. I think the extra
 * allocations and copying are killing it. I'll fix that.
 */

public class JmhBenchmark {

    static final int nmask = 1024*1024 - 1;
    static final double[] random_values = new double[nmask + 1];
    static final double[] lowp_values = new double[nmask + 1];

    static final byte[] buffer = new byte[30];
    static final byte[] bresults = new byte[30];

    static int i = 0;
    static final Grisu g = Grisu.fmt;

    static {

        Random r = new Random();
        int[] pows = new int[] { 1, 10, 100, 1000, 10000, 100000, 1000000 };

        for( int i = 0; i < random_values.length; ++i ) {
            random_values[i] = r.nextDouble();
        }

        for(int i = 0; i < lowp_values.length; ++i ) {
            lowp_values[i] = (1 + r.nextInt( 10000 )) / pows[r.nextInt( pows.length )];
        }
    }

    @Benchmark
    public String test_rand_doubleto() {
        String s = Double.toString( random_values[i] );
        i = (i + 1) & nmask;
        return s;
    }

    @Benchmark
    public String test_lowp_doubleto() {
        String s = Double.toString( lowp_values[i] );
        i = (i + 1) & nmask;
        return s;
    }

    @Benchmark
    public String test_rand_grisustr() {
        String s =  g.doubleToString( random_values[i] );
        i = (i + 1) & nmask;
        return s;
    }

    @Benchmark
    public String test_lowp_grisustr() {
        String s =  g.doubleToString( lowp_values[i] );
        i = (i + 1) & nmask;
        return s;
    }

    @Benchmark
    public byte[] test_rand_grisubuf() {
        g.doubleToBytes( bresults, 0, random_values[i] );
        i = (i + 1) & nmask;
        return bresults;
    }

    @Benchmark
    public byte[] test_lowp_grisubuf() {
        g.doubleToBytes( bresults, 0, lowp_values[i] );
        i = (i + 1) & nmask;
        return bresults;
    }

    public static void main(String[] args) throws RunnerException {
        Options opt = new OptionsBuilder()
                .include(".*" + JmhBenchmark.class.getSimpleName() + ".*")
                .warmupIterations(20)
                .measurementIterations(20)
                .forks(1)
                .build();

        new Runner(opt).run();
    }
}

score 4 · Accepted Answer

您只能通过分析其结果来证明基准是正确的。基准代码只能引发您必须跟进的危险信号。我在您的代码中看到了这些危险信号：

依赖static final字段来存储状态。这些字段的内容可以常规地“内联”到计算中，从而使您的部分基准测试无效。JMH 只会使您免于从@State对象中不断折叠常规字段。
使用static初始化器。虽然这在当前的 JMH 中没有影响，但预期的方法是使用@Setup方法来初始化状态。对于您的情况，它还有助于获得真正的随机数据点，例如，如果您设置@Setup(Level.Iteration)为在开始下一次测试迭代之前重新初始化值。

就一般方法而言，这是实现安全循环的方法之一：将循环计数器放在方法之外。还有另一个可以说是安全的：在方法中循环遍历数组，但将每个迭代结果都放入Blackhole.consume.

score 2 · Accepted Answer

不幸的是，您没有正确测量。尽管您尝试添加一些随机控制流，但 JVM 有很多机会优化您的代码，因为它是相当可预测的。例如：

String s = Double.toString( random_values[i] );
i = (i + 1) & nmask;
return s;

random_values是static final字段中的固定数组。由于的增量i是相当直接的，因此在最坏的情况下，它的值可以完全确定，因此s可以简单地设置。i是动态的，但它并没有真正逃脱，而又nmask是确定性的。JVM 仍然可以在这里优化代码，而无需查看程序集我就可以告诉您究竟是什么。

取而代之的是，为您的值使用非最终实例字段，@State向您的类添加注释并在使用@Setup. 如果您这样做，JMH 会采取措施正确地转义您的状态，以防止 JVM 在面对确定性值时进行优化。

score 1 · Accepted Answer

我认为根据Aleksey和Rafael的建议展示一个实现会很有帮助。

关键变化：

将同一组随机数据提供给所有基准。这是通过将数据集序列化为临时文件，setup()通过@Param机制提供方法的路径，然后将数据反序列化为实例字段来实现的。
每个基准测试都针对整个数据集运行这些方法。我们使用该operationsPerInvocation功能来获得准确的时间。
所有操作的结果都通过黑洞机制消耗掉。

我创建了两个示例，一个基于原始问题使用Serializable可以直接使用的数据集类，另一个测试每个人最喜欢的非序列化类，Optional.

如果 Aleksey 或 Rafael（或任何人）有任何建议，他们将不胜感激。

有Serializable数据集。

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.io.Serializable;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Comparator;
import java.util.Random;
import java.util.concurrent.TimeUnit;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

/**
 * In this example each benchmark loops over the entire randomly generated data set.
 * The same data set is used for all benchmarks.
 * And we black hole the results.
 */
@SuppressWarnings("javadoc")
@State(Scope.Benchmark)
public class JmhBenchmark {

    static final int DATA_SET_SAMPLE_SIZE = 1024 * 1024;

    static final Random RANDOM = new Random();

    static final Grisu g = Grisu.fmt;

    double[] random_values;

    double[] lowp_values;

    byte[] bresults;

    @Param("dataSetFilename")
    String dataSetFilename;

    @Setup
    public void setup() throws FileNotFoundException, IOException, ClassNotFoundException {

        try (FileInputStream fis = new FileInputStream(new File(this.dataSetFilename));
                ObjectInputStream ois = new ObjectInputStream(fis)) {

            final DataSet dataSet = (DataSet) ois.readObject();

            this.random_values = dataSet.random_values;
            this.lowp_values = dataSet.lowp_values;
        }

        this.bresults = new byte[30];
    }

    @Benchmark
    public void test_rand_doubleto(final Blackhole bh) {

        for (double random_value : this.random_values) {

            bh.consume(Double.toString(random_value));
        }
    }

    @Benchmark
    public void test_lowp_doubleto(final Blackhole bh) {

        for (double lowp_value : this.lowp_values) {

            bh.consume(Double.toString(lowp_value));
        }
    }

    @Benchmark
    public void test_rand_grisustr(final Blackhole bh) {

        for (double random_value : this.random_values) {

            bh.consume(g.doubleToString(random_value));
        }
    }

    @Benchmark
    public void test_lowp_grisustr(final Blackhole bh) {

        for (double lowp_value : this.lowp_values) {

            bh.consume(g.doubleToString(lowp_value));
        }
    }

    @Benchmark
    public void test_rand_grisubuf(final Blackhole bh) {

        for (double random_value : this.random_values) {

            bh.consume(g.doubleToBytes(this.bresults, 0, random_value));
        }
    }

    @Benchmark
    public void test_lowp_grisubuf(final Blackhole bh) {

        for (double lowp_value : this.lowp_values) {

            bh.consume(g.doubleToBytes(this.bresults, 0, lowp_value));
        }
    }

    /**
     * Serializes an object containing random data. This data will be the same for all benchmarks.
     * We pass the file name via the "dataSetFilename" parameter.
     *
     * @param args the arguments
     */
    public static void main(final String[] args) {

        try {
            // clean up any old runs as data set files can be large
            deleteTmpDirs(JmhBenchmark.class.getSimpleName());

            // create a tempDir for the benchmark
            final Path tempDirPath = createTempDir(JmhBenchmark.class.getSimpleName());

            // create a data set file
            final Path dateSetFilePath = Files.createTempFile(tempDirPath,
                    JmhBenchmark.class.getSimpleName() + "DataSet", ".ser");
            final File dateSetFile = dateSetFilePath.toFile();
            dateSetFile.deleteOnExit();

            // create the data
            final DataSet dataset = new DataSet();

            try (FileOutputStream fos = new FileOutputStream(dateSetFile);
                    ObjectOutputStream oos = new ObjectOutputStream(fos)) {
                oos.writeObject(dataset);
                oos.flush();
                oos.close();
            }

            final Options opt = new OptionsBuilder().include(JmhBenchmark.class.getSimpleName())
                .param("dataSetFilename", dateSetFile.getAbsolutePath())
                .operationsPerInvocation(DATA_SET_SAMPLE_SIZE)
                .mode(org.openjdk.jmh.annotations.Mode.All)
                .timeUnit(TimeUnit.MICROSECONDS)
                .forks(1)
                .build();

            new Runner(opt).run();

        } catch (final Exception e) {
            System.err.println(e.getMessage());
            e.printStackTrace();
            throw new RuntimeException(e);
        }

    }

    static Path createTempDir(String prefix) throws IOException {
        final Path tempDirPath = Files.createTempDirectory(prefix);
        tempDirPath.toFile()
            .deleteOnExit();
        return tempDirPath;
    }

    static void deleteTmpDirs(final String prefix) throws IOException {

        for (Path dir : Files.newDirectoryStream(new File(System.getProperty("java.io.tmpdir")).toPath(),
                prefix + "*")) {
            for (Path toDelete : Files.walk(dir)
                .sorted(Comparator.reverseOrder())
                .toArray(Path[]::new)) {
                Files.delete(toDelete);
            }
        }
    }

    static final class DataSet implements Serializable {

        private static final long serialVersionUID = 2194487667134930491L;

        private static final int[] pows = new int[] { 1, 10, 100, 1000, 10000, 100000, 1000000 };

        final double[] random_values = new double[DATA_SET_SAMPLE_SIZE];

        final double[] lowp_values = new double[DATA_SET_SAMPLE_SIZE];

        DataSet() {

            for (int i = 0; i < DATA_SET_SAMPLE_SIZE; i++) {
                this.random_values[i] = RANDOM.nextDouble();
            }

            for (int i = 0; i < DATA_SET_SAMPLE_SIZE; i++) {
                this.lowp_values[i] = (1 + RANDOM.nextInt(10000)) / pows[RANDOM.nextInt(pows.length)];
            }
        }

    }
}

使用不可序列化的测试对象 ( Optional)

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileOutputStream;
import java.io.IOException;
import java.io.ObjectInputStream;
import java.io.ObjectOutputStream;
import java.nio.file.Files;
import java.nio.file.Path;
import java.util.Comparator;
import java.util.List;
import java.util.Optional;
import java.util.Random;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.IntStream;

import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.Param;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.infra.Blackhole;
import org.openjdk.jmh.runner.Runner;
import org.openjdk.jmh.runner.options.Options;
import org.openjdk.jmh.runner.options.OptionsBuilder;

@SuppressWarnings("javadoc")
@State(Scope.Benchmark)
public class NonSerializable {

    static final int DATA_SET_SAMPLE_SIZE = 20000;

    static final Random RANDOM = new Random();

    Optional<Integer>[] optionals;

    @Param("dataSetFilename")
    String dataSetFilename;

    @Setup
    public void setup() throws FileNotFoundException, IOException, ClassNotFoundException {

        try (FileInputStream fis = new FileInputStream(new File(this.dataSetFilename));
                ObjectInputStream ois = new ObjectInputStream(fis)) {

            @SuppressWarnings("unchecked")
            List<Integer> strings = (List<Integer>) ois.readObject();

            this.optionals = strings.stream()
                .map(Optional::ofNullable)
                .toArray(Optional[]::new);
        }

    }

    @Benchmark
    public void mapAndIfPresent(final Blackhole bh) {

        for (int i = 0; i < this.optionals.length; i++) {

            this.optionals[i].map(integer -> integer.toString())
                .ifPresent(bh::consume);
        }
    }

    @Benchmark
    public void explicitGet(final Blackhole bh) {

        for (int i = 0; i < this.optionals.length; i++) {

            final Optional<Integer> optional = this.optionals[i];

            if (optional.isPresent()) {
                bh.consume(optional.get()
                    .toString());
            }
        }
    }

    /**
     * Serializes a list of integers containing random data or null. This data will be the same for all benchmarks.
     * We pass the file name via the "dataSetFilename" parameter.
     *
     * @param args the arguments
     */
    public static void main(final String[] args) {

        try {
            // clean up any old runs as data set files can be large
            deleteTmpDirs(NonSerializable.class.getSimpleName());

            // create a tempDir for the benchmark
            final Path tempDirPath = createTempDir(NonSerializable.class.getSimpleName());

            // create a data set file
            final Path dateSetFilePath = Files.createTempFile(tempDirPath,
                    NonSerializable.class.getSimpleName() + "DataSet", ".ser");
            final File dateSetFile = dateSetFilePath.toFile();
            dateSetFile.deleteOnExit();

            final List<Integer> dataSet = IntStream.range(0, DATA_SET_SAMPLE_SIZE)
                .mapToObj(i -> RANDOM.nextBoolean() ? RANDOM.nextInt() : null)
                .collect(Collectors.toList());

            try (FileOutputStream fos = new FileOutputStream(dateSetFile);
                    ObjectOutputStream oos = new ObjectOutputStream(fos)) {
                oos.writeObject(dataSet);
                oos.flush();
                oos.close();
            }

            final Options opt = new OptionsBuilder().include(NonSerializable.class.getSimpleName())
                .param("dataSetFilename", dateSetFile.getAbsolutePath())
                .operationsPerInvocation(DATA_SET_SAMPLE_SIZE)
                .mode(org.openjdk.jmh.annotations.Mode.All)
                .timeUnit(TimeUnit.MICROSECONDS)
                .forks(1)
                .build();

            new Runner(opt).run();

        } catch (final Exception e) {
            System.err.println(e.getMessage());
            e.printStackTrace();
            throw new RuntimeException(e);
        }

    }

    static Path createTempDir(String prefix) throws IOException {
        final Path tempDirPath = Files.createTempDirectory(prefix);
        tempDirPath.toFile()
            .deleteOnExit();
        return tempDirPath;
    }

    static void deleteTmpDirs(final String prefix) throws IOException {

        for (Path dir : Files.newDirectoryStream(new File(System.getProperty("java.io.tmpdir")).toPath(),
                prefix + "*")) {
            for (Path toDelete : Files.walk(dir)
                .sorted(Comparator.reverseOrder())
                .toArray(Path[]::new)) {
                Files.delete(toDelete);
            }
        }
    }

}

java - 随机数据与 JMH Java 微基准测试浮点打印

3 回答 3

Related

Reference