Yes, most assemblers are (at least) two-pass - precisely because of forward references like these. Adding macro capabilities can add more passes.
Look at an assembly listing, not just the op-codes. As you said the actual offset is "2", I'm assuming memory is word-addressed.
0000 3180 main LDA RA hello
0001 F020 IPT #32
0002 F00C HLT
0003 4865 hello .STR "Hello, world!"
The first two columns are the PC and opcode. I'm not sure how the LDA
instruction has been encoded (where is the +2
offset in there?)
In the first pass, assuming all addressing is relative, the assmebler would emit the fixed part of the op-code (covering the LDA RA
part) along with a marker to show it needed to patch up the instruction with the address of hello
in the second pass.
At this point it knows the size, but not the complete value, of the final machine language.
It then continues on, working out the address of each instruction and building its symbol table.
In the second pass, now knowing the above information, it patches each instruction by calculating relative offsets etc. It also often regenerates the entire output (including PC values).
Occasionally, something will be detected in the second pass which prevents it continuing. For example, perhaps you can only reference objects within 256 words (-127 thru +128), but the label hello
turns out to be more than 128 words away. This means it should have used a two-word instruction (with an absolute address), which changes everything it learnt during the first pass.
This is often referred to as a 'fix up' error. The same thing can happen during the link phase.
Single pass assemblers are only possible if you insist on 'define before use'. In which case, your code would report hello
as an undefined symbol.
You also need to read up on "program sections". Whilst .STR
is not an executable instruction, it is a directive to the assembler to place the binary representation of the string into the CODE section of the image (vs DATA).