assembly - Return from a procedure in ARM assembly

Question

When creating a function in ARM assembly, I usually push contents of LR register into r4-r5 at the beggining and after the function has finished I pop r4-r5 to PC:

.global myfunc
.type   myfunc, %function

myfunc:
push {r4-r5,lr}
... do stuff...
pop {r4-r5,pc}

However, I have read that using stmfd and ldmfd one might get better performance:

myfunc:
stmfd sp!,{r4-r11,lr}
...do stuff...
ldmfd sp!,{r4-r11,pc}

What is exactly the sp? I presume it's not really worth saving all the registers r4-r11 in case I'm not actually using them inside myfunc, right? So the push-pop variant is better in that case?

score 6 · Accepted Answer

PUSH {...} is the Thumb equivalent of the ARM instruction STMDB SP!,{...}

POP {...} is the Thumb equivalent of the ARM instruction LDMIA SP!,{...}

STM means STore Multiple.
DB means Decrement Before, i.e. decrement the destination address before each store in this case.
IA means Increment After, i.e. increment the source address after each load in this case.
! means write back the final address to the source/destination address register. For example if SP was 0x100 and you did STMDB SP!,{R0-R2} you'd have 0xF4 in SP afterwards.
SP is an alias for R13, and is used as the stack pointer on ARM processors.

score 3 · Accepted Answer

push and pop are pseudo instructions to the assembler they are not real instructions. You either get a store with the base register updated an stm.

push {r11}
stmdb r13!,{r11}

push {r10-r12}
stmdb r13!,{r10-r12}

I prefer stmdb to stmfd just different syntax for the same instruction. (stmdb and ldmia make sense to me, decrement before and increment after).

assemble then disassemble.

   0:   e52db004    push    {fp}        ; (str fp, [sp, #-4]!)
   4:   e92d0800    stmfd   sp!, {fp}
   8:   e92d1c00    push    {sl, fp, ip}
   c:   e92d1c00    push    {sl, fp, ip}

If you look up the stm encoding or even just look at the bits and think about it the upper bits of the instruction 0xe92d are stmia/fd, the lower bits are flags indicating what registers what to be saved, notice at address 4 that is a push of 11, then on 8 and c you have that bit set r11, and then the one below it r10 and the one above it r12.

push and pop are easier to read than trying to remember to use sp and use the ! after the register and remember the ia/db/fd, etc suffix and all that.

I believe that thumb might have an actual push/pop.

The single register variant for arm turned into a single store, doesnt matter if you use an stm with one instruction or an str, the operations are functionally equivalent.

So long as you update r13 after the operation and you use db or fd for the stm the you can use the pseudo instruction or the real instructions.

if you are going to store/restore more than one register then definitely list them in a single instruction, dont make a list of several pushes or pops

no:
push {r10}
push {r11}
push {r12}
yes:
push {r10-r11}

Unless on thumb then you might not have a choice as you can only push r0-r7+r14 and pop r0-r7+r15 to save higher registers you have to copy them down into lower registers then use push. and you have to use push the stm wont let you use r13. (thumb2 depending on what extensions are available to your architecture, give you more of an arm-like experience).

re-reading your question

sp is r13, the stack pointer. the pseudo instruction chooses the right instructions so you dont need to worry about stm vs str. When you store more than one register you "can" get an optimization on modern arm systems, but not guaranteed. If your amba/axi bus is 64 bits wide it is more than 2 times faster to write 64 bits at a time rather than 32 bits at a time, because on a 64 bit memory system it takes a read-modify-write to do a 32 bit write, but a 64 bit write does not (lets ignore the cache behavior). If the stm is on an aligned address (when using the stack it would take too much code to figure that out, dont worry about it) then a push of 2 registers would be noticeably faster than two separate pushes (unless the core optimizes those into one bus cycle). If you push say 4 registers one of three things happens if unaligned then you get three transfers a 32 bit transfer on the unaligned address (lets say 0x1004), then a 64 bit transfer on the aligned address after that (0x1008), then a 32 bit transfer of the last register (0x1010). If that four register push had been on analigned address then one of two things happens either two separate 64 bit transfers two registers to 0x2010 lets say and two to 0x2018 or a length of 2 transfer (two 64 bit items in an single transfer) at the aligned base address, say 0x2010. You wont get the worst case though which is four individual 32 bit transfers, so it is worth using the stm/push.

score 1 · Accepted Answer

You don't need to push the registers onto the stack if you are not going to use them. Having said that, you will have to see if that adds any real performance benefit. I think, it is simple to push everything, as at later point of time if you or someone modifies the code, it won't accidentally corrupt the registers and the stack.

By the way you can also do this; that is, save only r4-r5 using stmfd.

myfunc:
stmfd sp!,{r4-r5,lr}
...do stuff...
ldmfd sp!,{r4-r5,pc}

OR

myfunc:
stmfd r13!,{r4-r5,r14}
...do stuff...
ldmfd r13!,{r4-r5,pc}

You can make out that sp is alias for r13 and lr is alias for r14. Where, sp stands for stack pointer and lr for link register.

score 0 · Accepted Answer

SP is the stack pointer register - indicates the top of the current stack. I believe you only need to use stmfd if you're saving higher registers. If you only need to save a couple of lower registers just push & pop.

assembly - Return from a procedure in ARM assembly

4 回答 4

Related

Reference