performance - Does the branch predictor also include I/O instructions in its prediction?

Question

I am currently writing an Intel 8042 driver and have written two loops to wait until some buffers are ready for use:

/* Waits until the Intel 8042's input buffer is empty, i.e., until the
 * controller has processed the input. */
i8042_waitin:
    pause
    in $i8042_STATUS, %al
    and $i8042_STAT_INEMPTY, %al
    jz i8042_waitin
    ret

/* Waits until the Intel 8042's output buffer is full, i.e., data to read is
 * available.
 * ATTENTION: this here is the polling variant but there is also a way with
 * interrupts! By setting bit 0 in the command byte you can enable an interrupt
 * to be fired when the output buffer is full. */
i8042_waitout:
    pause
    in $i8042_STATUS, %al
    and $i8042_STAT_OUTFULL, %al
    jz i8042_waitout
    ret

As you can see, I inserted pause instructions in the loops. I've just recently learned about it and wanted to try it out, naturally.
Since the content of %al is unpredictable because it's an I/O read, the branch predictor will fill the pipeline with instructions of the loop: after some iterations, it will notice always one branch is taken, similarly to the case here.

The above is correct if the branch predictor really includes I/O instructions in its prediction, which I am not sure of.

So does the branch predictor adjust itsself using the result of I/O instructions as is the case with unpredictable memory reads? Or is there something else going on here?
Does pause make sense here?

score 3 · Accepted Answer

分支预测器在预测中不包含任何其他指令。它只是根据分支指令本身和/或其先前的分支历史进行猜测。循环中的其他指令，PAUSE，IN 或 AND 对分支预测没有任何影响。

您链接的答案中建议的 PAUSE 指令并不意味着影响分支预测器。这是为了防止在该问题的示例代码中的 CMP 指令访问的内存位置被另一个处理器写入时发生流水线停顿。CMP 指令也不影响分支预测。

Peter Cordes 提到，您可能会对 CPU 用于推测执行指令以尝试保持其流水线充满的不同技术感到困惑。在您链接的问题中，推测执行最终会损害自旋锁的性能有两种不同的方式。两者都有一个共同的根，CPU 试图尽可能快地执行循环，但实际上影响自旋锁性能的是它退出循环的速度。只有循环最后一次迭代的速度很重要。

自旋锁代码的推测执行问题的第一部分是分支预测器将很快假设分支总是被采用。在循环的最后一次迭代中会出现停顿，因为 CPU 将继续推测性地执行循环的另一次迭代。它必须把它扔掉，然后开始在循环外执行代码。但事实证明情况更糟，因为 CPU 会推测性地读取 CMP 指令中使用的内存位置。因为它访问正常的内存，推测性读取是无害的，它们没有副作用。（这与您的 IN 指令不同，因为从设备读取的 I/O 可能会产生副作用。）这允许 CPU 推测性地执行循环的多次迭代。

在您的代码中，我认为 PAUSE 指令不会提高循环的性能。IN 指令不访问普通内存，因此不会因为写入其他 CPU 的内存而导致流水线被刷新。由于 IN 指令也不能被推测执行，因此流水线中一次只能有一条 IN 指令，因此在循环结束时这种错误预测的分支的成本会相对较小。它可能具有该答案中提到的其他好处，即降低功耗并为超线程处理器上的其他逻辑 CPU 提供更多执行资源。

并不是说它真的很重要。在现代处理器上，键盘控制器发送或接收单个字节需要超过一百万个周期，甚至需要几百个周期，因为一些最坏情况下的管道停顿并不重要。

performance - Does the branch predictor also include I/O instructions in its prediction?

1 回答 1

Related

Reference