java - 在java中，使用byte或short而不是int和float而不是double更有效吗？

Question

我注意到我总是使用 int 和 doubles，无论数字需要多大或多小。byte那么在java中，使用orshort代替intandfloat代替更有效double吗？

所以假设我有一个包含大量整数和双打的程序。如果我知道这个数字合适，是否值得通过并将我的整数更改为字节或短裤？

我知道java没有无符号类型，但是如果我知道这个数字只是正数，我还能做些什么吗？

我所说的高效主要是指处理。我假设如果所有变量都是一半大小，那么垃圾收集器会快得多，并且计算也可能会更快一些。（我想因为我正在使用android，所以我也需要有点担心ram）

（我假设垃圾收集器只处理对象而不是原始对象，但仍会删除废弃对象中的所有原始对象，对吧？）

我用我拥有的一个小型 android 应用程序进行了尝试，但并没有真正注意到有什么不同。（虽然我没有“科学地”测量任何东西。）

我认为它应该更快、更高效是错误的吗？我不想经历并改变一个庞大的程序中的所有内容，以发现我浪费了我的时间。

当我开始一个新项目时，从一开始就值得吗？（我的意思是我认为每一点都会有所帮助，但如果是这样，为什么似乎没有人这样做。）

score 122 · Accepted Answer

我认为它应该更快、更高效是错误的吗？我不想经历并改变一个庞大的程序中的所有内容，以发现我浪费了我的时间。

简短的回答

是的，你错了。在大多数情况下，它在使用的空间方面几乎没有什么区别。

不值得尝试优化...除非您有明确的证据表明需要优化。如果您确实需要特别优化对象字段的内存使用，您可能需要采取其他（更有效的）措施。

更长的答案

Java 虚拟机使用（实际上）是 32 位原始单元大小的倍数的偏移量对堆栈和对象字段进行建模。因此，当您将局部变量或对象字段声明为（例如） abyte时，变量/字段将存储在 32 位单元格中，就像 a 一样int。

这有两个例外：

long和double值需要 2 个原始 32 位单元
原始类型的数组以打包形式表示，因此（例如）一个字节数组每个 32 位字包含 4 个字节。

因此，可能值得优化使用longand double... 和大型基元数组。但总的来说没有。

^{理论上，JIT可能能够优化这一点，但在实践中，我从未听说过这样的 JIT。一个障碍是 JIT 通常在创建正在编译的类的实例之后才能运行。如果 JIT 优化了内存布局，您可能有两个（或更多）同一类对象的“风味”......这将带来巨大的困难。}

重访

查看@meriton 答案中的基准测试结果，似乎使用shortandbyte代替int会导致乘法性能下降。事实上，如果你孤立地考虑这些操作，惩罚是巨大的。（你不应该孤立地考虑它们......但这是另一个话题。）

我认为解释是 JIT 可能在每种情况下都使用 32 位乘法指令进行乘法运算。但是在byteandshort情况下，它会执行额外的指令来将中间 32 位值转换为 a byteorshort在每个循环迭代中。（理论上，可以在循环结束时进行一次转换......但我怀疑优化器是否能够解决这个问题。）

无论如何，这确实指出了切换到short和byte作为优化的另一个问题。它可能会使性能更差......在算术和计算密集型算法中。

次要问题

我知道java没有无符号类型，但是如果我知道这个数字只是正数，我还能做些什么吗？

不，无论如何都不是在性能方面。Integer（在,等中有一些方法可以将,Long等处理为无符号的。但这些并没有带来任何性能优势。这不是他们的目的。）intlong

（我假设垃圾收集器只处理对象而不是原始对象，但仍会删除废弃对象中的所有原始对象，对吧？）

正确的。对象的字段是对象的一部分。当对象被垃圾收集时，它就会消失。同样，当收集阵列时，阵列的单元格就会消失。当字段或单元格类型是原始类型时，该值将存储在字段/单元格中......它是对象/数组的一部分......并且已被删除。

score 32 · Accepted Answer

这取决于 JVM 的实现以及底层硬件。大多数现代硬件不会从内存（甚至从第一级缓存）中获取单个字节，即使用较小的原始类型通常不会减少内存带宽消耗。同样，现代 CPU 的字长为 64 位。他们可以对更少的位执行操作，但这通过丢弃额外的位来工作，这也不是更快。

唯一的好处是较小的原始类型可以导致更紧凑的内存布局，尤其是在使用数组时。这节省了内存，可以提高引用的局部性（从而减少缓存未命中的数量）并减少垃圾收集开销。

然而，一般来说，使用较小的原始类型并不快。

为了证明这一点，请看以下基准：

public class Benchmark {

    public static void benchmark(String label, Code code) {
        print(25, label);
        
        try {
            for (int iterations = 1; ; iterations *= 2) { // detect reasonable iteration count and warm up the code under test
                System.gc(); // clean up previous runs, so we don't benchmark their cleanup
                long previouslyUsedMemory = usedMemory();
                long start = System.nanoTime();
                code.execute(iterations);
                long duration = System.nanoTime() - start;
                long memoryUsed = usedMemory() - previouslyUsedMemory;
                
                if (iterations > 1E8 || duration > 1E9) { 
                    print(25, new BigDecimal(duration * 1000 / iterations).movePointLeft(3) + " ns / iteration");
                    print(30, new BigDecimal(memoryUsed * 1000 / iterations).movePointLeft(3) + " bytes / iteration\n");
                    return;
                }
            }
        } catch (Throwable e) {
            throw new RuntimeException(e);
        }
    }
    
    private static void print(int desiredLength, String message) {
        System.out.print(" ".repeat(Math.max(1, desiredLength - message.length())) + message);
    }
    
    private static long usedMemory() {
        return Runtime.getRuntime().totalMemory() - Runtime.getRuntime().freeMemory();
    }

    @FunctionalInterface
    interface Code {
        /**
         * Executes the code under test.
         * 
         * @param iterations
         *            number of iterations to perform
         * @return any value that requires the entire code to be executed (to
         *         prevent dead code elimination by the just in time compiler)
         * @throws Throwable
         *             if the test could not complete successfully
         */
        Object execute(int iterations);
    }

    public static void main(String[] args) {
        benchmark("long[] traversal", (iterations) -> {
            long[] array = new long[iterations];
            for (int i = 0; i < iterations; i++) {
                array[i] = i;
            }
            return array;
        });
        benchmark("int[] traversal", (iterations) -> {
            int[] array = new int[iterations];
            for (int i = 0; i < iterations; i++) {
                array[i] = i;
            }
            return array;
        });
        benchmark("short[] traversal", (iterations) -> {
            short[] array = new short[iterations];
            for (int i = 0; i < iterations; i++) {
                array[i] = (short) i;
            }
            return array;
        });
        benchmark("byte[] traversal", (iterations) -> {
            byte[] array = new byte[iterations];
            for (int i = 0; i < iterations; i++) {
                array[i] = (byte) i;
            }
            return array;
        });
        
        benchmark("long fields", (iterations) -> {
            class C {
                long a = 1;
                long b = 2;
            }
            
            C[] array = new C[iterations];
            for (int i = 0; i < iterations; i++) {
                array[i] = new C();
            }
            return array;
        });
        benchmark("int fields", (iterations) -> {
            class C {
                int a = 1;
                int b = 2;
            }
            
            C[] array = new C[iterations];
            for (int i = 0; i < iterations; i++) {
                array[i] = new C();
            }
            return array;
        });
        benchmark("short fields", (iterations) -> {
            class C {
                short a = 1;
                short b = 2;
            }
            
            C[] array = new C[iterations];
            for (int i = 0; i < iterations; i++) {
                array[i] = new C();
            }
            return array;
        });
        benchmark("byte fields", (iterations) -> {
            class C {
                byte a = 1;
                byte b = 2;
            }
            
            C[] array = new C[iterations];
            for (int i = 0; i < iterations; i++) {
                array[i] = new C();
            }
            return array;
        });

        benchmark("long multiplication", (iterations) -> {
            long result = 1;
            for (int i = 0; i < iterations; i++) {
                result *= 3;
            }
            return result;
        });
        benchmark("int multiplication", (iterations) -> {
            int result = 1;
            for (int i = 0; i < iterations; i++) {
                result *= 3;
            }
            return result;
        });
        benchmark("short multiplication", (iterations) -> {
            short result = 1;
            for (int i = 0; i < iterations; i++) {
                result *= 3;
            }
            return result;
        });
        benchmark("byte multiplication", (iterations) -> {
            byte result = 1;
            for (int i = 0; i < iterations; i++) {
                result *= 3;
            }
            return result;
        });
    }
}

在我的 Intel Core i7 CPU @ 3.5 GHz 上使用 OpenJDK 14 运行，这将打印：

     long[] traversal     3.206 ns / iteration      8.007 bytes / iteration
      int[] traversal     1.557 ns / iteration      4.007 bytes / iteration
    short[] traversal     0.881 ns / iteration      2.007 bytes / iteration
     byte[] traversal     0.584 ns / iteration      1.007 bytes / iteration
          long fields    25.485 ns / iteration     36.359 bytes / iteration
           int fields    23.126 ns / iteration     28.304 bytes / iteration
         short fields    21.717 ns / iteration     20.296 bytes / iteration
          byte fields    21.767 ns / iteration     20.273 bytes / iteration
  long multiplication     0.538 ns / iteration      0.000 bytes / iteration
   int multiplication     0.526 ns / iteration      0.000 bytes / iteration
 short multiplication     0.786 ns / iteration      0.000 bytes / iteration
  byte multiplication     0.784 ns / iteration      0.000 bytes / iteration

如您所见，唯一显着的速度节省发生在遍历大型数组时；使用较小的对象字段产生的好处可以忽略不计，并且在小数据类型上计算实际上稍微慢一些。

总的来说，性能差异很小。优化算法远比选择原始类型重要。

score 6 · Accepted Answer

如果您大量使用它们，使用byte而不是可以提高性能。int这是一个实验：

import java.lang.management.*;

public class SpeedTest {

/** Get CPU time in nanoseconds. */
public static long getCpuTime() {
    ThreadMXBean bean = ManagementFactory.getThreadMXBean();
    return bean.isCurrentThreadCpuTimeSupported() ? bean
            .getCurrentThreadCpuTime() : 0L;
}

public static void main(String[] args) {
    long durationTotal = 0;
    int numberOfTests=0;

    for (int j = 1; j < 51; j++) {
        long beforeTask = getCpuTime();
        // MEASURES THIS AREA------------------------------------------
        long x = 20000000;// 20 millions
        for (long i = 0; i < x; i++) {
                           TestClass s = new TestClass(); 

        }
        // MEASURES THIS AREA------------------------------------------
        long duration = getCpuTime() - beforeTask;
        System.out.println("TEST " + j + ": duration = " + duration + "ns = "
                + (int) duration / 1000000);
        durationTotal += duration;
        numberOfTests++;
    }
    double average = durationTotal/numberOfTests;
    System.out.println("-----------------------------------");
    System.out.println("Average Duration = " + average + " ns = "
            + (int)average / 1000000 +" ms (Approximately)");


}

}

此类测试创建新TestClass. 每个测试执行 2000 万次，有 50 次测试。

这是测试类：

 public class TestClass {
     int a1= 5;
     int a2= 5; 
     int a3= 5;
     int a4= 5; 
     int a5= 5;
     int a6= 5; 
     int a7= 5;
     int a8= 5; 
     int a9= 5;
     int a10= 5; 
     int a11= 5;
     int a12=5; 
     int a13= 5;
     int a14= 5; 
 }

我已经SpeedTest上课了，最后得到了这个：

 Average Duration = 8.9625E8 ns = 896 ms (Approximately)

现在我将 ints 更改为 TestClass 中的字节并再次运行它。结果如下：

 Average Duration = 6.94375E8 ns = 694 ms (Approximately)

我相信这个实验表明，如果您要实例化大量变量，使用 byte 代替 int 可以提高效率

score 2 · Accepted Answer

byte 一般认为是 8 位。short 通常被认为是 16 位。

在一个“纯”的环境中，它不是 java 的所有字节和长、短和其他有趣的东西的实现通常对你隐藏，字节可以更好地利用空间。

但是，您的计算机可能不是 8 位的，也可能不是 16 位的。这意味着特别是要获得 16 或 8 位，它需要诉诸“诡计”，这会浪费时间来假装它有能力在需要时访问这些类型。

在这一点上，这取决于硬件是如何实现的。但是，我一直认为，最好的速度是通过将东西存储在适合您的 CPU 使用的块中来实现的。64 位处理器喜欢处理 64 位元素，而任何低于此值的东西通常需要“工程魔法”来假装它喜欢处理它们。

score 2 · Accepted Answer

short/byte/char 性能较差的原因之一是缺乏对这些数据类型的直接支持。通过直接支持，这意味着 JVM 规范没有提及这些数据类型的任何指令集。存储、加载、添加等指令具有 int 数据类型的版本。但他们没有短/字节/字符的版本。例如考虑下面的java代码：

void spin() {
 int i;
 for (i = 0; i < 100; i++) {
 ; // Loop body is empty
 }
}

同样被转换为机器代码，如下所示。

0 iconst_0 // Push int constant 0
1 istore_1 // Store into local variable 1 (i=0)
2 goto 8 // First time through don't increment
5 iinc 1 1 // Increment local variable 1 by 1 (i++)
8 iload_1 // Push local variable 1 (i)
9 bipush 100 // Push int constant 100
11 if_icmplt 5 // Compare and loop if less than (i < 100)
14 return // Return void when done

现在，考虑将 int 更改为 short ，如下所示。

void sspin() {
 short i;
 for (i = 0; i < 100; i++) {
 ; // Loop body is empty
 }
}

对应的机器码会变化如下：

0 iconst_0
1 istore_1
2 goto 10
5 iload_1 // The short is treated as though an int
6 iconst_1
7 iadd
8 i2s // Truncate int to short
9 istore_1
10 iload_1
11 bipush 100
13 if_icmplt 5
16 return

如您所见，要操作 short 数据类型，它仍然使用 int 数据类型指令版本，并在需要时将 int 显式转换为 short。现在，由于这个原因，性能会降低。

现在，引用不提供直接支持的原因如下：

Java 虚拟机对 int 类型的数据提供了最直接的支持。这部分是因为对 Java 虚拟机的操作数堆栈和局部变量数组的高效实现的预期。它还受到典型程序中 int 数据频率的推动。其他整数类型的直接支持较少。例如，没有字节、字符或短版本的存储、加载或添加指令。

引用自此处提供的 JVM 规范（第 58 页）。

score 0 · Accepted Answer

差别几乎看不出来！这更多的是设计、适当性、统一性、习惯等问题……有时只是品味问题。当您关心的只是您的程序启动并运行并且用 afloat代替 anint不会损害正确性时，我认为选择一个或另一个没有任何优势，除非您可以证明使用任何一种类型都会改变性能。根据 2 或 3 字节不同的类型调整性能确实是您最不应该关心的事情；Donald Knuth 曾经说过：“过早的优化是万恶之源”（不确定是他，如果你有答案，请编辑）。

java - 在java中，使用byte或short而不是int和float而不是double更有效吗？

6 回答 6

简短的回答

更长的答案

重访

次要问题

Related

Reference