http://ref.x86asm.net/coder64.html has an opcode map, but with a bit of experience you won't need one most of the time. Especially when you have disassembly, you can just check the manual entry for that mnemonic (https://www.felixcloutier.com/x86/add), and see which of the possible opcodes it is (83 /0 add r/m32, imm8
).
Clearly this has a 32-bit operand-size (dword ptr
) memory destination, and the source is an immediate (numeric constant). That rules out a , r64
register source for 2 separate reasons. So even without looking at the machine code, it's definitely add r/m32, imm
with an imm8 or imm32. Any sane assembler will of course pick imm8 for a small constant that fits in a signed 8-bit integer.
Generally different ways of encoding the same instruction aren't special, so the source-level assembly / disassembly is fine, as long as you understand what's a register, what's memory, and what's an immediate.
But there are a few special cases, e.g. Agner Fog's guide notes that rotates by 1 using the short-form encoding are slower than rol reg, imm8
even when the imm8=1, because the flag-updating special case for rotate-by-1 actually depends on the opcode, not the immediate count. (Intel's documentation apparently assumes your assembler will always pick the short-form for rotate by constant 1. The part about "masked count" may only apply to rotate by cl
. https://www.felixcloutier.com/x86/rcl:rcr:rol:ror#flags-affected. I haven't tested this recently and am not 100% sure I'm remembering correctly when OF is updated (but other flags in the SPAZO group are always left unmodified), but IIRC that's why rotates by 1 (2 uops) and by cl (3 uops) are slow, vs. rotates by other immediate counts (1 uop) on Intel).
Or https://github.com/travisdowns/uarch-bench/wiki/Intel-Performance-Quirks. Specifically I mean Which Intel microarchitecture introduced the ADC reg,0 single-uop special case? - even on Haswell / Skylake, adc al,0
(using the short form with no modrm byte) is 2 uops, and so is the equivalent adc eax, 12345
. But adc edx, 12345
is 1 uop using the non-special case.) Then you have to either check the machine code, or know how your assembler will have chosen to encode a given instruction. (Optimizing for size).
BTW, using a segment with a non-zero base adds 1 cycle of latency to address-generation, IIRC, but aren't a significant throughput penalty. (Unless of course throughput bottlenecks on a latency chain that it's part of...)