java - 比较直接和非直接 ByteBuffer 的 get/put 操作

Question

从非直接字节缓冲区获取/放置是否比从直接字节缓冲区获取/放置更快？

如果我必须从直接字节缓冲区读取/写入，最好先读取/写入线程本地字节数组，然后用字节数组完全更新（写入）直接字节缓冲区？

score 24 · Accepted Answer

从非直接字节缓冲区获取/放置是否比从直接字节缓冲区获取/放置更快？

如果您将堆缓冲区与不使用本机字节顺序的直接缓冲区进行比较（大多数系统是小端，直接 ByteBuffer 的默认值是大端），性能非常相似。

如果您使用本机有序字节缓冲区，则多字节值的性能会显着提高。因为byte无论你做什么，它都没有什么区别。

在 HotSpot/OpenJDK 中，ByteBuffer 使用 Unsafe 类，并且许多native方法被视为内在函数。这是依赖于 JVM 的，并且 AFAIK Android VM 在最近的版本中将其视为内在的。

如果您转储生成的程序集，您可以看到 Unsafe 中的内在函数被转换为一条机器代码指令。即他们没有JNI 调用的开销。

实际上，如果您进行微调，您可能会发现 ByteBuffer getXxxx 或 setXxxx 的大部分时间都花在边界检查上，而不是实际的内存访问。出于这个原因，当我必须获得最佳性能时，我仍然直接使用 Unsafe （注意：Oracle 不鼓励这样做）

如果我必须从直接字节缓冲区读取/写入，最好先读取/写入线程本地字节数组，然后用字节数组完全更新（写入）直接字节缓冲区？

我不想看到那比什么更好。;) 这听起来很复杂。

通常最简单的解决方案更好更快。

您可以使用此代码自行测试。

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    bb1.clear();
    bb2.clear();
    long start = System.nanoTime();
    int count = 0;
    while (bb2.remaining() > 0)
        bb2.putInt(bb1.getInt());
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

印刷

Each putInt/getInt took an average of 83.9 ns
Each putInt/getInt took an average of 1.4 ns
Each putInt/getInt took an average of 34.7 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.3 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns
Each putInt/getInt took an average of 1.2 ns

我很确定 JNI 调用需要超过 1.2 ns 的时间。

为了证明它不是“JNI”调用，而是它周围的胡言乱语导致了延迟。您可以直接使用 Unsafe 编写相同的循环。

public static void main(String... args) {
    ByteBuffer bb1 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    ByteBuffer bb2 = ByteBuffer.allocateDirect(256 * 1024).order(ByteOrder.nativeOrder());
    for (int i = 0; i < 10; i++)
        runTest(bb1, bb2);
}

private static void runTest(ByteBuffer bb1, ByteBuffer bb2) {
    Unsafe unsafe = getTheUnsafe();
    long start = System.nanoTime();
    long addr1 = ((DirectBuffer) bb1).address();
    long addr2 = ((DirectBuffer) bb2).address();
    for (int i = 0, len = Math.min(bb1.capacity(), bb2.capacity()); i < len; i += 4)
        unsafe.putInt(addr1 + i, unsafe.getInt(addr2 + i));
    long time = System.nanoTime() - start;
    int operations = bb1.capacity() / 4 * 2;
    System.out.printf("Each putInt/getInt took an average of %.1f ns%n", (double) time / operations);
}

public static Unsafe getTheUnsafe() {
    try {
        Field theUnsafe = Unsafe.class.getDeclaredField("theUnsafe");
        theUnsafe.setAccessible(true);
        return (Unsafe) theUnsafe.get(null);
    } catch (Exception e) {
        throw new AssertionError(e);
    }
}

印刷

Each putInt/getInt took an average of 40.4 ns
Each putInt/getInt took an average of 44.4 ns
Each putInt/getInt took an average of 0.4 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns
Each putInt/getInt took an average of 0.3 ns

因此，您可以看到该native调用比您预期的 JNI 调用要快得多。这种延迟的主要原因可能是 L2 缓存速度。;)

全部在 i3 3.3 GHz 上运行

score 2 · Accepted Answer

直接缓冲区保存 JNI 中的数据，因此 get() 和 put() 必须跨越 JNI 边界。非直接缓冲区将数据保存在 JVM 领域。

所以：

如果您根本不使用Java 领域的数据，例如只是将一个通道复制到另一个通道，那么直接缓冲区会更快，因为数据根本不必跨越JNI 边界。
相反，如果您在 Java 领域中处理数据，非直接缓冲区会更快。它是否重要取决于有多少数据必须越过 JNI 边界以及每次传输的量。例如，一次从直接缓冲区获取或放置一个字节可能会变得非常昂贵，而一次获取/放置 16384 个字节将大大摊销 JNI 边界成本。

要回答您的第二段，我将使用本地 byte[] 数组，而不是线程本地的，但是如果我在 Java 领域使用数据，我根本不会使用直接字节缓冲区。正如 Javadoc 所说，直接字节缓冲区应该只在它们提供可衡量的性能优势的情况下使用。

java - 比较直接和非直接 ByteBuffer 的 get/put 操作

2 回答 2

Related

Reference