'==================== Original PB/CC Program ========================
FUNCTION PBMAIN
aa$="This is a test"
! mov al, al
a& = LEN(aa$)
! mov ah, ah
END FUNCTION
Above is a small PB/CC program I used to try and figure out how PowerBasic
encodes the LEN() function in assembly code. As you will see, learning the answer is somewhat difficult. I used IDA Pro Freeware version 4.3 to take the
produced compile code, which is an EXE file, and tell it to dissemble a new file.
I indicate which EXE file to process, and select the PE format for loading the file.
On the next screen just clidk OK and accept the defaults. In short order, the IDE will display with the dissassembled code in its own window. It is commented, in color, but probably best rendered into an ASM file at this point.
You can explore the various options and menus later. Right now, under File,
Pick "Produce" and select "ASM" file. You can use Alt+F10 as an alternative.
Save the file where it is convenient for you. Now you can exit IDA Pro, but must
chose whether to save the created database for it or not. Your call.
Using any text editor, you can next look at the produced ASM file. If you used the above code example, you can search for "mov al, al", which should take you right to the corresponding code. This is the extract that I made below:
'======== Related Extracts From Produced ASM File From Compiled EXE =======
Code found in ! MOV pair:
mov al, al ; my designated lead flag
mov edx, [ebp+var_8C] ;this must point to AA$
call sub_4019B5
call sub_40199D
mov esi, eax ;this must be REGISTER ref to A&
mov ah, ah ;my designated trail flag
There are two called subs here, and they also appear in the ASM file. You can
easily find them by searching for each by name:
…
sub_4019B5 proc near ; CODE XREF: sub_4010CB+34 p
push esi ;save of ESI contents on stack
sub dword ptr [ebp-78h], 4 ;between ESP and EPB, subtract 4
mov esi, [ebp-78h] ; move that value to ESI
or edx, 80000000h ;Set negative bit for some reason
mov [ebp+esi-5Ch], edx ;save this at location EBP+ESI-5Ch
pop esi ;restore ESI contents from stack
retn ;exit this sub
sub_4019B5 endp
sub_40181D proc near ; CODE XREF: .text:00401857 p
; sub_4018B1+23 p ...
and esi, 7FFFFFFFh ;now clear the sign bit in ESI
jz short loc_401829 ;if results zero, string addr invalid
mov ecx, [esi-4] ;string length is BELOW strptr ref
retn
loc_401829: ; CODE XREF: sub_40181D+6 j
mov ecx, esi ;otherwise ESI has string length
mov esi, offset unk_4020BC ;make ESI point to something else
retn ;exit this sub
sub_40181D endp
The process that PowerBasic employed is not very clear, is it? The way I
presently see it, PowerBasic sets the EBP pointer to work in two directions;
any passed parameters are located on the stack above the point marked by
EBP, and then ESP is set lower to allow a region for local and static variable use
between ESP and EBP. All positioning above and below EBP are by offsets that
PowerBASIC knows, based on allocations made as the program is analyzed and compiled.
It also appears that in this instance, since A& and AA$ are being treated as local variables, that the length of AA$ is set 4 bytes below the point where the location point for the string pointer is set. If true, this would have been reversed from the order used with PB/DOS. That needs to be checked further.
Note the tendency by PowerBasic to use ESI as the primary register for some of
the processing. A lot of assembly programmers build their reliance around the
EAX register, and perhaps PowerBasic's approach facilitates the idea of keeping
ESI as the first alternative register for a memory variable. It would be worth
noting what coding changes happen if we add a #REGISTER NONE to the PowerBasic program and recompile it.
While there are several things that are not clear at this point, it does appear
that PowerBasic uses the sign bit to signal something about string variables. Perhaps whether they are valid or not, or the type of string variable involved.
The thing I find most confusing here though, is that ESI, EDX, ECX all seem to
have specific roles, but if the length of the string is returned by the last sub in ECX, why is the very next statement in the main body assigning EAX to ESI? That sort of throws me at this point.
Anyway, it is an attempt, and perhaps it will help you get started with your own
analysis of your PowerBasic and ASM code.