一位电气工程师最近告诫我不要将 GPU 用于科学计算(例如,在精度真正重要的地方),因为没有像 CPU 那样的硬件保护措施。这是真的吗?如果是这样,典型硬件中的问题有多普遍/严重?
4 回答
Actually, modern GPUs fit extremely well for scientific computing and many HPC applications are being at least partially ported to run on GPUs, for the sake of performance and energy efficiency. Unlike older GPUs, the modern ones (take NVIDIA's Fermi or Kepler architectures, for example) provide fully standardized IEEE-754 formats, for both single and double precision, so you should be able to use these just like you do on a modern CPU.
I found a few (older) papers on this, but it does seem the problem has been fixed in cards with compute capabilitity >= 2.0.
Current GPUs do not support double-precision computation and their single-precision support glosses over important aspects of the IEEE-754 floating-point standard[1], such as correctly rounded results and proper closure of the number system. ... Our results show that there are serious errors with the GPUs' results at certain edge cases, in addition to the incorrect handling of denormalized numbers.
Karl E. Hillesland and Anselmo Lastra, "GPU Floating-Point Paranoia." In Proc. GP2, August 2004.
Guillaume Da Graca and David Defour, "Implementation of float-float operators on graphics hardware." In Proc. 7th conference on Real Numbers and Computers, July 2006.
Double precision (CUDA compute capability 1.3 and above)[14] deviate from the IEEE 754 standard: round-to-nearest-even is the only supported rounding mode for reciprocal, division, and square root. In single precision, denormals and signalling NaNs are not supported; only two IEEE rounding modes are supported (chop and round-to-nearest even), and those are specified on a per-instruction basis rather than in a control word; and the precision of division/square root is slightly lower than single precision.
NVIDIA 发布了一份白皮书,其中涵盖了使用浮点的一般细节,特别是在 GPU 上的细节:
http://developer.download.nvidia.com/assets/cuda/files/NVIDIA-CUDA-Floating-Point.pdf
实际上,大多数科学计算通常不需要那么准确,因为测量误差等在很大程度上压倒了浮点舍入引入的误差(除了,也许在退化的情况下,比如按顺序对浮点数组求和,而不是按相反顺序求和,但是即使在 CPU 中你也会遇到这种问题,并且没有任何东西会警告你,因为它按设计工作)。在科学计算中,一般来说,在一定的误差范围内显示结果就足够了,并且表明该误差不会引起实际问题。
浮点被设计为快速,不一定是精确的数字,即使在 CPU 中也是如此,这就是为什么我们总是被教导将浮点与 epsilon 进行比较。
OTOH,实际上需要精确到最后一位数字的舍入规则的计算,如会计或数论,应该考虑使用定点算术(例如十进制模块),它可以让您准确地指定舍入规则。