5

In short, when I have multiple db sections in my .data section, the compiled addresses/labels are off when compiled by NASM. In my testing they are off by 256 bytes in the resulting Mach-O binary.

Software I am using:

  • OS X 10.10.5
  • nasm NASM version 2.11.08, installed via Homebrew as required for x84_64 ASM
  • gobjdump GNU objdump (GNU Binutils) 2.25.1, installed via Homebrew
  • clang Apple LLVM version 6.1.0 (clang-602.0.53) (based on LLVM 3.6.0svn)

What works:

Take for example the following "hello world" NASM assembly.

main.s

global _main

section .text
_main:
mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel msg]
mov     rdx, len
syscall

mov     rax, 0x2000001
mov     rdi, 0
syscall

section .data
msg:    db      "Hello, world!", 10
len:    equ     $ - msg

Compiled and run with:

/usr/local/bin/nasm -f macho64 -o main.o main.s
clang -o main main.o
./main

This works great, and produces the following output:

Hello, world!

What doesn't:

Now, to add another message, we just need to add another string to the data section, and another syscall. Simple enough.

main.s

global _main

section .text
_main:
mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel msga]
mov     rdx, lena
syscall

mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel msgb]
mov     rdx, lenb
syscall

mov     rax, 0x2000001
mov     rdi, 0
syscall

section .data
msga:    db      "Hello, world!", 10
lena:    equ     $ - msga
msgb:    db      "Break things!", 10
lenb:    equ     $ - msgb

Compile and run, same as before, and we get:

Break things!

What?!? Shouldn't we be getting?:

Hello, world!
Break things!

What's wrong?:

Something clearly went wrong. Time to disassemble the resulting binary and see what we got.

$ gobjdump -d -M intel main

Produces the following for _main:

0000000100000f7c <_main>:
   100000f7c:b8 04 00 00 02       mov    eax,0x2000004
   100000f81:bf 01 00 00 00       mov    edi,0x1
   100000f86:48 8d 35 73 01 00 00 lea    rsi,[rip+0x173]        # 100001100 <msgb+0xf2>
   100000f8d:ba 0e 00 00 00       mov    edx,0xe
   100000f92:0f 05                syscall 
   100000f94:b8 04 00 00 02       mov    eax,0x2000004
   100000f99:bf 01 00 00 00       mov    edi,0x1
   100000f9e:48 8d 35 69 00 00 00 lea    rsi,[rip+0x69]        # 10000100e <msgb>
   100000fa5:ba 0e 00 00 00       mov    edx,0xe
   100000faa:0f 05                syscall 
   100000fac:b8 01 00 00 02       mov    eax,0x2000001
   100000fb1:bf 00 00 00 00       mov    edi,0x0
   100000fb6:0f 05                syscall 

From the comment # 100001100 <msgb+0xf2>, we can see that it is pointing not to the msga symbol, but to 0xf2 past msgb, or 100001100 (at this address there are null bytes, resulting in no output). Inspecting the binary in a hex editor, I find the actual msga string at offset 1000, or address 100001000. The means that the address in the compiled binary is now off by 0x100/256 bytes, simply because there is now a second db label. What?!?


A sorry excuse for a workaround:

As an experiment, I decided to try putting both of the db sections into separate ASM/object files, and linking all 3 together. Doing so works.

main.s

global _main
extern _msga
extern _lena
extern _msgb
extern _lenb

section .text
_main:
mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel _msga]
mov     rdx, _lena
syscall

mov     rax, 0x2000004
mov     rdi, 1
lea     rsi, [rel _msgb]
mov     rdx, _lenb
syscall

mov     rax, 0x2000001
mov     rdi, 0
syscall

msga.s

global _msga
global _lena

section .data
_msga:   db      "Hello, world!", 10
_lena:   equ     $ - _msga

msgb.s

global _msgb
global _lenb

section .data
_msgb:   db      "Break things!", 10
_lenb:   equ     $ - _msgb

Compile and run with:

/usr/local/bin/nasm -f macho64 -o main.o main.s
/usr/local/bin/nasm -f macho64 -o msga.o msga.s
/usr/local/bin/nasm -f macho64 -o msgb.o msgb.s
clang -o main msga.o msgb.o main.o
./main

Results in:

Hello, world!
Break things!

While this does work, I find it hard to believe this is the best solution.


What is going wrong?

Surely there must be a way to have multiple db labels in one ASM file? Am I doing something wrong in the way I write the ASM? Is this a bug in NASM? Is this expected behavior somehow, in which case why? My workaround is extra work and clutter, so I would greatly appreciate any assistance in this.

4

2 回答 2

7

是的,这是 Nasm-2.11.08 中的一个错误。Nasm-2.11.06 应该可以工作。Nasm-2.11.09rc1 应该可以工作(?)。对不起'回合!

于 2015-09-09T00:48:49.637 回答
1

相关问题可以在这里找到:

错误 3392306 - 相对寻址和数据部分的问题

Homebrew 提供的当前版本 2.11.08 使用以下 diff 文件修补了此问题:

https://raw.githubusercontent.com/Homebrew/patches/7a329c65e/nasm/nasm_outmac64.patch

From 4920a0324348716d6ab5106e65508496241dc7a2 Mon Sep 17 00:00:00 2001
From: Cyrill Gorcunov <gorcunov@gmail.com>
Date: Sat, 9 May 2015 18:07:47 +0300
Subject: [PATCH] output: outmac64 -- Fix the case when first hit matches the
 symbol

In case if we're looking up for a symbol and it's first
one in symbol table we might endup with error because of
using GE here (78f477b35f) ending cycle with @nearest = NULL.

http://bugzilla.nasm.us/show_bug.cgi?id=3392306

Reprted-by: Benjamin Randazzo <benjamin@linuxcrashing.org>
Signed-off-by: Cyrill Gorcunov <gorcunov@gmail.com>
---
 output/outmac64.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/output/outmac64.c b/output/outmac64.c
index c07dcbc..1d30e64 100644
--- a/output/outmac64.c
+++ b/output/outmac64.c
@@ -304,7 +304,7 @@ static struct symbol *get_closest_section_symbol_by_offset(uint8_t fileindex, in

     for (sym = syms; sym; sym = sym->next) {
         if ((sym->sect != NO_SECT) && (sym->sect == fileindex)) {
-            if ((int64_t)sym->value >= offset)
+            if ((int64_t)sym->value > offset)
                 break;
             nearest = sym;
         }
-- 
2.4.10.GIT

所以如果你是通过 Homebrew 安装的,这个问题现在应该已经解决了。

于 2015-12-02T20:36:44.363 回答