javascript - Why is drawing POT images faster than NPOT images?

Question

I was looking into canvas speed optimizations, and I found this answer: https://stackoverflow.com/a/7682200/999400

don't use images with odd widths. always use widths as powers of 2.

So I'm wondering, why is this faster?

I have seen posts that explain that this helps with old graphics cards (when using OpenGL & such), but I'm talking about speed, not compatability, and about canvas, not OpenGL/WebGL.

score 1 · Accepted Answer

它更快，因为您可以使用 << 运算符而不是 * 运算符。即执行“左移 1”（乘以 2）比执行“乘以 43”要快。可以通过在图像的每一行的末尾添加填充字节来绕过这个限制（就像 MS 在内存位图中所做的那样），但本质上，这是两条指令之间速度差异的结果。

在 8 位 320x200（模式 13h）的旧时代，您可以使用简单的公式索引一个像素：

pixOffset = xPos + yPos * 320;

但这很慢。一个更好的选择是使用

C

pixOffset = xPos + (yPos * 256) + (yPos * 64)

汇编

mov ax, xPos    ;   ax = xPos
mov bx, yPos    ;   bx = yPos
shl bx, 6       ;   bx = yPos * 64
add ax, bx      ;   ax = xPos + (yPos * 64)
shl bx, 2       ;   bx = yPos * 256
add ax, bx      ;   ax = xPos + yPos * 320

这似乎违反直觉，但如果写得好，它只使用单时钟指令。即你可以在 6 个时钟周期内计算出偏移量。当然，流水线和缓存未命中会使情况复杂化。

在硬件中实现移位寄存器也比在 $$ 和晶体管中的完整乘法单元便宜得多。因此，可以使用相同数量的晶体管来提供更好的性能，或者可以使用更少的晶体管来提供更低功耗的相同性能。

AFAIK，现代处理器的 mul（和 div）指令是在查找表的帮助下实现的。这在很大程度上缓解了问题，但也不是没有问题。如需进一步阅读，请查看 Pentium fdiv 错误（在芯片内错误地填充了查找表）

http://en.wikipedia.org/wiki/Pentium_FDIV_bug

所以最后，它本质上是用于实现功能的硬件/软件的人工制品。

javascript - Why is drawing POT images faster than NPOT images?

1 回答 1

Related

Reference