Deprecated: Array and string offset access syntax with curly braces is deprecated in /homepages/21/d38531796/htdocs/jose/smfforum/Sources/Subs.php on line 3825 Print Page - An ASM Exercise In Number Crunching
Theo's Forum
IT-Consultant: Charles Pegge => Assembler => Assembly Code Examples => Topic started by: Charles Pegge on June 13, 2007, 02:00:34 PM
Title: An ASM Exercise In Number Crunching
Post by: Charles Pegge on June 13, 2007, 02:00:34 PM
How To Do Pythagoras In Bulk
You can gain considerable performance by processing arrays in assembler, by taking advantage of being able to do most of the work in the registers. And the annoted may look very bulky but this assembles to very few byte of code.
The Floating point unit (FPU) has a stack of 8 registers and is loaded by pushing stuff onto the top of the stack and popping it off again when no longer required. It is important to leave the FPU clean of any debris. The pop instruction is part of many operations such as faddp and fstp.
The most significant differences between FreeBasic and PowerBasic are that FB does its multidimensional arrays (major,minor) whereas PB does its arrays: (minor,major). Also when passing basic variables to registers, FB requires indirection thus: mov eax,[ a] whereas PB does it directly: mov eax, a.
' PYTHAGORAS 3D ' Charles E V Pegge ' 13 June 2007
' PowerBasic PBWin Version
#COMPILE EXE #DIM ALL ' PYTHAGORAS 3D ' does the equivalent of r=sqr(x^2+y^2+z^2) for arrays of floating point data
' parameter a: the pointer to an x y z array of coordinates in 64 bit floating point. ' parameter r: the pointer to an array for the result r. ' parameter c: the number of points to process
FUNCTION pythagoras3d( BYVAL a AS DOUBLE PTR, BYVAL r AS DOUBLE PTR, BYVAL c AS LONG) AS LONG 'ASM '----------------------------'-------------------------------------------------------------------- ! mov eax, a ' eax will be our source index ! mov edx, r ' edx will be our destination index ! mov ecx, c ' ecx will be our down-counter repeats: ' DO ! dec ecx ' if ecx was initially 0 then the loop is completely skipped ! jl exits ' when ecx is -1 or less ! fld QWORD PTR [eax] ' push the x coordinate onto the FPU register stack ! add eax,8 ' advance eax for the next coordinate (CPU and FPU work in parallel) ! fmul st(0),st(0) ' square the x coordinate ! fld QWORD PTR [eax] ' push the y coordinate onto the fpu stack ! add eax,8 ' advance the eax pointer again ! fmul st(0),st(0) ' square the y ! faddp st(1),st(0) ' add this to the squared x then pop the y to discard it ! fld QWORD PTR [eax] ' load the z coordinate onto the fpu stack ! add eax,8 ' advance the pointer ready for the next xyz point ! fmul st(0),st(0) ' square the z ! faddp st(1),st(0) ' add this value to the others and pop to discard it. ! fsqrt ' take the quare root of this sum of squares x,y,z ! fstp QWORD PTR [edx] ' store the result at the r pointer and pop it from the stack ! add edx,8 ' advance the r pointer edx ready for the next result. ! jmp repeats ' repeat the loop exits: ' exit point ! mov function, ecx ' the value in our down counter should always be -1 'END ASM ' done '----------------------------'-------------------------------------------------------------------- END FUNCTION
FUNCTION PBMAIN () AS LONG
DIM xyz(2,1000) AS DOUBLE DIM rads(1000) AS DOUBLE DIM a AS LONG xyz(0,2)=3 xyz(1,2)=4 xyz(2,2)=5 a=pythagoras3d(VARPTR(xyz(0,0)),VARPTR(rads(0)),10) MSGBOX STR$( rads(2)) ' dim xx as double ptr =varptr(xyz(0,0)): print xx[5] ' test: arrays in freebasic are major,minor ' whereas in PowerBasic the order id minor,major. ' answer should be 7.071067811865476 'PRINT "end"
' does the equivalent of r=sqr(x^2+y^2+z^2) for arrays of floating point data
' parameter a: the pointer to an x y z array of coordinates in 64 bit floating point. ' parameter r: the pointer to an array for the result r. ' parameter c: the number of points to process
function pythagoras3d( byval a as double ptr, byval r as double ptr, byval c as long) as long asm '---------------------------'-------------------------------------------------------------------- mov eax,[a] ' eax will be our source index mov edx,[r] ' edx will be our destination index mov ecx,[c] ' ecx will be our down-counter repeats: ' DO dec ecx ' if ecx was initially 0 then the loop is completely skipped jl exits ' when ecx is -1 or less fld qword ptr [eax] ' push the x coordinate onto the FPU register stack add eax,8 ' advance eax for the next coordinate (CPU and FPU work in parallel) fmul st(0),st(0) ' square the x coordinate fld qword ptr [eax] ' push the y coordinate onto the fpu stack add eax,8 ' advance the eax pointer again fmul st(0),st(0) ' square the y faddp st(1),st(0) ' add this to the squared x then pop the y to discard it fld qword ptr [eax] ' load the z coordinate onto the fpu stack add eax,8 ' advance the pointer ready for the next xyz point fmul st(0),st(0) ' square the z faddp st(1),st(0) ' add this value to the others and pop to discard it. fsqrt ' take the quare root of this sum of squares x,y,z fstp qword ptr [edx] ' store the result at the r pointer and pop it from the stack add edx,8 ' advance the r pointer edx ready for the next result. jmp repeats ' repeat the loop exits: ' exit point mov [function], ecx ' the value in our down counter should always be -1 end asm ' done '---------------------------'-------------------------------------------------------------------- end function
dim xyz(1000,2) as double dim radius(1000) as double xyz(2,0)=3 xyz(2,1)=4 xyz(2,2)=5 print pythagoras3d(varptr(xyz(0,0)),varptr(radius(0)),10) print radius(2) ' dim xx as double ptr =varptr(xyz(0,0)): print xx[5] ' test: arrays in freebasic are major,minor ' answer should be 7.071067811865476 print "end"