我正在与 Y86 程序的计算机体系结构课程中的一个团队合作,以实现乘法函数 imul。我们有一个可以工作的代码块,但我们正在努力使其尽可能高效地执行。目前,对于 imul,我们的块看起来像这样:
imul:
# push all used registers to stack for preservation
pushq %rdi
pushq %rsi
pushq %r8
pushq %r9
pushq %r10
irmovq 0, %r9 # set 0 into r9
rrmovq %rdi, %r10 # preserve rdi in r10
subq %rsi, %rdi # compare rdi and rsi
rrmovq %r10, %rdi # restore rdi
jl continue # if rdi (looping value/count) less than rsi, don't swap
swap:
# swap rsi and rdi to make rdi smaller value of the two
rrmovq %rsi, %rdi
rrmovq %r10, %rsi
continue:
subq %r9, %rdi # check if rdi is zero
cmove %r9, %rax # if rdi = 0, rax = 0
je imulDone # if rdi = 0, jump to end
irmovq 1, %r8 # set 1 into r8
rrmovq %rsi, %rax # set rax equal to initial value from rsi
imulLoop:
subq %r8, %rdi # count - 1
je imulDone # if count = 0, jump to end
addq %rsi, %rax # add another instance of rsi into rax, looped adition
jmp imulLoop # restart loop
imulDone:
# pop all used registers from stack to original values and return
popq %r10
popq %r9
popq %r8
popq %rsi
popq %rdi
ret
现在我们最好的想法是使用立即算术指令(isubq 等)而不是普通的 OPq 指令,并将常量设置到寄存器中并使用这些寄存器。在这种特定情况下,这种方法会更有效吗?非常感谢!