1

https://uops.info/和 Agner Fog 的指令表等网站,甚至英特尔自己的手册,都列出了同一指令的各种形式。例如add m, r(在 Agner 的表格中)或add (m64, r64)在 uops.info 上,或ADD r/m64, r64在英特尔的手册中 ( https://www.felixcloutier.com/x86/add )。


这是我在 Godbolt 上运行的一个简单示例

__thread int a;
void Test() {
    a+=5;
}

添加的是add DWORD PTR fs:0xfffffffffffffffc,0x5. 它从操作码开始64 83 04 25

有几种方法可以编写我的真实代码,但我想查找这可能需要多少个周期以及其他信息。我怎么能找到对这个指令的引用?我尝试在https://uops.info/table.html中输入“添加”并检查我的架构。但我不知道哪个条目是正在使用的指令。

现在在这种特定情况下,我猜测操作码是Add m64, r64但我不知道fs:在地址之前使用是否有任何惩罚,或者是否有办法查看操作码,所以我可以确认我正在查看正确的参考

4

2 回答 2

6

http://ref.x86asm.net/coder64.html has an opcode map, but with a bit of experience you won't need one most of the time. Especially when you have disassembly, you can just check the manual entry for that mnemonic (https://www.felixcloutier.com/x86/add), and see which of the possible opcodes it is (83 /0 add r/m32, imm8).

Clearly this has a 32-bit operand-size (dword ptr) memory destination, and the source is an immediate (numeric constant). That rules out a , r64 register source for 2 separate reasons. So even without looking at the machine code, it's definitely add r/m32, imm with an imm8 or imm32. Any sane assembler will of course pick imm8 for a small constant that fits in a signed 8-bit integer.

Generally different ways of encoding the same instruction aren't special, so the source-level assembly / disassembly is fine, as long as you understand what's a register, what's memory, and what's an immediate.

But there are a few special cases, e.g. Agner Fog's guide notes that rotates by 1 using the short-form encoding are slower than rol reg, imm8 even when the imm8=1, because the flag-updating special case for rotate-by-1 actually depends on the opcode, not the immediate count. (Intel's documentation apparently assumes your assembler will always pick the short-form for rotate by constant 1. The part about "masked count" may only apply to rotate by cl. https://www.felixcloutier.com/x86/rcl:rcr:rol:ror#flags-affected. I haven't tested this recently and am not 100% sure I'm remembering correctly when OF is updated (but other flags in the SPAZO group are always left unmodified), but IIRC that's why rotates by 1 (2 uops) and by cl (3 uops) are slow, vs. rotates by other immediate counts (1 uop) on Intel).

Or https://github.com/travisdowns/uarch-bench/wiki/Intel-Performance-Quirks. Specifically I mean Which Intel microarchitecture introduced the ADC reg,0 single-uop special case? - even on Haswell / Skylake, adc al,0 (using the short form with no modrm byte) is 2 uops, and so is the equivalent adc eax, 12345. But adc edx, 12345 is 1 uop using the non-special case.) Then you have to either check the machine code, or know how your assembler will have chosen to encode a given instruction. (Optimizing for size).


BTW, using a segment with a non-zero base adds 1 cycle of latency to address-generation, IIRC, but aren't a significant throughput penalty. (Unless of course throughput bottlenecks on a latency chain that it's part of...)

于 2020-12-14T00:05:28.290 回答
4

查看 x86 CPU 的英特尔手册 它大约有 6000 页长,我确定它在那里大声笑 https://software.intel.com/sites/default/files/managed/39/c5/325462-sdm-vol-1- 2abcd-3abcd.pdf

另请查看此站点http://ref.x86asm.net/coder64.html大小只需搜索 64(它显示为灰色的操作码),如您所见,64 与添加操作码无关,它只是一个 FS :[] 段覆盖前缀,83是ADD Opcode

fs 前缀
添加操作码

这是您的操作码的工作方式,就像我在 IDA 反汇编程序中模拟它一样。 查看字节


在 ASM 中看起来像这样
汇编

于 2020-12-14T00:21:06.373 回答