First of all, you can't debug your branching until you fix How to load a single byte from address in assembly - you're loading 4 bytes of characters and comparing that whole 32-bit value against 'a'
and so on. Use movzx
instead of mov ebx, LocalBuffer[esi]
because it's a char
array.
If you've been single-stepping your code in the debugger, maybe you've noticed that all 4 bytes of ebx
are non-zero. That's why your cmp/branches aren't working or doing what you expect.
@zx485 explained the general case of a chain of branches to go through until you can definitely accept or reject an input.
But you can also simplify by using efficient range-checks using the unsigned-compare trick. e.g. Reverse-engineering asm using sub / cmp / setbe back to C? My attempt is compiling to branches shows how that works for just the lower-case ASCII range.
Even better, ASCII is conveniently designed so the A-Z and a-z ranges align with each other, and don't cross a %32
boundary, so you can force a byte to lower-case with c |= 0x20
, or to upper case with c ^= ~0x20
. Then you only have that one range to check for alphabetic characters.
OR
with 20h forces upper-case characters to lower-case, and doesn't make any non-alphabetic characters into lowercase, so you can do that on a copy of your register.
See What is the idea behind ^= 32, that converts lowercase letters to upper and vice versa? and especially How to access a char array and change lower case letters to upper case, and vice versa for MSVC inline asm that loops over a char array and checks for alphabetic or not.
Make sure you don't destroy your only copy because you still need to count upper separately from lower; you're just creating a temporary to branch on. Unless you want to avoid unused positions in your count array, then maybe you want c - 'A'
as your array index. But probably not if you have one array for all characters and digits you want to count.
Example
For the loop structure, I have out-of-range characters jump over the Do Something
part, reaching the compare/branch loop condition. The load and index increment happens every iteration, regardless of the loaded character.
Note that every character that's not in any of the ranges is a non-digit and a non-letter. It doesn't make sense to have a non-digit branch target separate from a non-letter branch target because that's not what you're figuring out. You could have digit and letter branch to separate places, though.
_asm
{
xor esi, esi ; i=0
Loop1: ; do {
; load from the array *inside* the loop.
movzx ebx, byte ptr LocalBuffer[esi]
inc esi ; ebp = buf[i++]
; check for digits first
lea eax, [ebx - '0']
cmp al, 9
jbe CharCount ; if (c-'0' <= 9) goto CharCount
; non-digits fall through into checking for alphabetic
mov eax, ebx
or eax, 20h ; force to lower-case
sub eax, 'a' ; subtract start of the range
cmp al, 'z'-'a' ; see if it was inside the length of the range (unsigned)
ja skipCount
; in the common case (alphabetic characters), fall through into CharCount
CharCount:
; EBX still holds the character value, zero-extended
add byte ptr [counts + ebx], 1 ;Do something
; or use [counts + ebx*4] if you have an int array.
skipCount: ; rejected characters jump here, skipping count increment
cmp esi, 127
jb Loop1 ; } while(i<127)
}
You don't need to waste a 2nd register on another loop counter (ECX) when you already have ESI. cmp/jb
is more efficient than the loop
instruction anyway.
I think we can save one instruction by doing the subtract first (so we can still use lea
to copy-and-subtract), but then we have to clear the 0x20 bit instead of setting it so we're dealing with upper-case.
;; untested, but I think this is correct, too, using LEA+AND instead of MOV+OR+SUB
lea eax, [ebx - 'A']
and eax, ~20h ; clear the lower-case bit
cmp al, 'Z'-'A' ; 25, same as 'z'-'a' of course.
ja skipCount
c - 'A'
= 0x20 for c='a'
. Character codes past 'Z'
but before 'a'
produce smaller results so clearing the 0x20
bit can't give us a false-positive.
PS: if this is the same histogram problem you asked previous questions about, you don't need to filter while reading, just make your array of counts have 256 elements (for every possible uint8_t
value) and then only loop over the ones you want to print.
If you were getting segfaults using ebx
as the index, that's because you loaded 4 bytes (a large integer) instead of zero-extending one. We already fixed this bug in previous versions of you question.
Also, as I previously explained in comments, you don't need to copy your string input to a LocalBuffer
, just do char *bufptr = Buffer;
and in inline asm do mov esi, bufptr
to get that pointer into a register. That's inefficient, but much better than copying a whole array. Especially for counts as well.
Or https://godbolt.org/z/QszVMf shows how to access class members from inline asm.