assembly - 如何处理 x86 ASM 中的未知长度输入？

Question

所以我的大部分代码都在工作，但我无法弄清楚如何准确处理输入句子长度未知的事实。我是组装新手，这有点令人困惑。

（现在我将它设置为好像已知长度为三个字符，但显然我需要更改它。）

.data       
input_msg:  .ascii "Enter a random sentence: "
input_msg_len:  .long 25
input_str:  .ascii  "???" # 3rd should get newline  
count:      .long 0 
newline:    .long 10    

.text               
.global _start          
_start:             

# prompt for input
    mov $4, %eax    # prompt for input
    mov $1, %ebx
    mov $input_msg, %ecx
    mov input_msg_len, %edx
    int $0x80
# get input
    mov $3, %eax    # 3 to request "read"
    mov $0, %ebx    # 0 is "console" (keyboard)
    mov $input_str, %ecx # input buffer addr
    mov $3, %edx    # number of symbols typed in
    int $0x80       # Go do the service!

again1:
    mov $input_str, %ecx    
    add count, %ecx # count is offset from input_str beginning

    mov $4, %eax    # to write
    mov $1, %ebx    # to console display
    mov $1, %edx    # 1 byte to write
    int $0x80   # Do it!

    push    %ecx        # push onto stack   

    incl    count   # increment count

    cmp $3, count   # compare lengths
    jnz again1     # jmp again if not 0 (no difference)

    mov $0, %edi    # use edi as loop counter

    mov $4, %eax    # print out msg
    mov $1, %ebx    # etc.
    mov $1, %edx    # length
    int $0x80       # OS, serve!

again2:     
    pop %ecx    

    mov $4, %eax    # print out msg
    mov $1, %ebx    # etc.
    mov $1, %edx    # length
    int $0x80       # OS, serve!        

    inc %edi    # increment edi 
    cmp count, %edi # compare lengths
    jnz again2  # jmp again if not 0 (no difference)

# print newline
    mov $4, %eax    # print out msg
    mov $1, %ebx    # etc.
    mov $newline, %ecx  # addr
    mov $1, %edx    # length
    int $0x80       # OS, serve!
# exit
    mov $1, %eax    # exit
    int $0x80       # OS, serve!

基本上，我想知道的是如何让代码适用于任何句子，而不仅仅是一个 3 个字符长？

score 0 · Accepted Answer

您只需要分配一个更长的缓冲区input_str并读取有效读取的文本量，在 read 系统调用之后的 eax 中找到。

换句话说，您需要确定可以接受的最大长度并将代码更改为如下所示：
注意：可以静态分配这样的短字符串，当然如果您需要一个大缓冲区（比如说从文件中获取数据），您可以改为动态分配缓冲区）。同样，对于键盘输入，132 可能就足够了。

...
input_str:  db  132 dup(?)  # 132 bytes buffer for input string
input_str_len: .long        # length of the string effectively read from user
...
# get input
    mov $3, %eax    # 3 to request "read"
    mov $0, %ebx    # 0 is "console" (keyboard)
    mov $input_str, %ecx # input buffer addr
    mov $131, %edx    # _Max_ number of bytes accepted in input_str
    int $0x80       # Go do the service!

    move %eax, $input_str_len    # save nb of bytes effectively read
...
    #you can then use input_str_len to control when to exit processing loop etc.

score 0 · Accepted Answer

嗯......你可以用= 0做一个sys_brk。%ebx那是你原来的“休息” - 保存它。将 4k 的倍数添加到该值并再次 sys_brk。将您的 sys_read 放入该缓冲区。如果您阅读了完整的 4k（在%eaxsys_read 之后），请在当前的“break”和 sys_brk 中添加更多内容，然后再阅读一些内容……直到完成。这个“应该”在一个连续的缓冲区中为您提供所有内容......

只是决定一些“最大值”并且不要让他们进入更多的东西，这要容易得多！您可能想要“刷新缓冲区”。如您所知，sys_read（从键盘）在看到换行符 (0xA) 之前不会返回。如果讨厌的用户输入了多个%edx字符，其余的将留在操作系统的缓冲区中。您可以在 3 字节缓冲区代码中看到这一点。键入“abcls”。我想你会发现退出后你的 shell 提示符会读取“ls”，并为你提供一个目录列表。没问题，但它可能是“rm”或有害的东西！当您的 sys_read 返回时，如果%eax小于%edx，您就完成了。如果%eax=%edx（不会更多），如果缓冲区中的最后一个字符是 LF (0xA)，那么你就完成了。如果没有，则 sys_read 进入一个虚拟缓冲区，直到你得到那个 LF。这会使您的代码复杂化很多，但它“更安全”......

我可以尝试 Nasm 语法中的示例，但我认为我最好不要尝试 AT&T ... :)

assembly - 如何处理 x86 ASM 中的未知长度输入？

2 回答 2

Related

Reference