### Author Topic: An ASM Exercise In Number Crunching  (Read 4640 times)

0 Members and 1 Guest are viewing this topic.

#### Charles Pegge ##### An ASM Exercise In Number Crunching
« on: June 13, 2007, 02:00:34 PM »
How To Do Pythagoras In Bulk

You can gain considerable performance by processing arrays in assembler, by taking advantage of being able to do most of the work in the registers. And the annoted may look very bulky but this assembles to very few byte of code.

The Floating point unit (FPU) has a stack of 8 registers and is loaded by pushing stuff onto the top of the stack and popping it off again when no longer required. It is important to leave the FPU clean of any debris.
The pop instruction  is part of many operations such as faddp and fstp.

The most significant differences between FreeBasic and PowerBasic are that FB does its multidimensional  arrays (major,minor) whereas PB does its arrays: (minor,major). Also when passing basic variables to registers, FB requires indirection thus: mov eax,[ a] whereas PB does it directly: mov eax, a.

PowerBasic Version
Code: [Select]
`' PYTHAGORAS 3D' Charles E V Pegge' 13 June 2007' PowerBasic PBWin Version#COMPILE EXE#DIM ALL' PYTHAGORAS 3D' does the equivalent of r=sqr(x^2+y^2+z^2) for arrays of floating point data' parameter a:  the pointer to an x y z array of coordinates in 64 bit floating point.' parameter r:  the pointer to an array for the result r.' parameter c:  the number of points to processFUNCTION pythagoras3d( BYVAL a AS DOUBLE PTR, BYVAL r AS DOUBLE PTR, BYVAL c AS LONG) AS LONG'ASM'----------------------------'--------------------------------------------------------------------! mov eax, a                 ' eax will be our source index! mov edx, r                 ' edx will be our destination index! mov ecx, c                 ' ecx will be our down-counter  repeats:                   ' DO!   dec ecx                  ' if ecx was initially 0 then the loop is completely skipped!   jl exits                 ' when ecx is -1 or less!   fld  QWORD PTR [eax]     ' push the x coordinate onto the FPU register stack!   add eax,8                ' advance eax for the next coordinate (CPU and FPU work in parallel)!   fmul st(0),st(0)         ' square the x coordinate!   fld QWORD PTR [eax]      ' push the y coordinate onto the fpu stack!   add eax,8                ' advance the eax pointer again!   fmul st(0),st(0)         ' square the y!   faddp st(1),st(0)        ' add this to the squared x then pop the y to discard it!   fld QWORD PTR [eax]      ' load the z coordinate onto the fpu stack!   add eax,8                ' advance the pointer ready for the next xyz point!   fmul st(0),st(0)         ' square the z!   faddp st(1),st(0)        ' add this value to the others and pop to discard it.!   fsqrt                    ' take the quare root of this sum of squares x,y,z!   fstp QWORD PTR [edx]     ' store the result at the r pointer and pop it from the stack!   add edx,8                ' advance the r pointer edx ready for the next result.!  jmp repeats               ' repeat the loop exits:                      ' exit point! mov function, ecx          ' the value in our down counter should always be -1'END ASM                     ' done'----------------------------'--------------------------------------------------------------------END FUNCTIONFUNCTION PBMAIN () AS LONGDIM xyz(2,1000) AS DOUBLEDIM rads(1000) AS DOUBLEDIM a AS LONGxyz(0,2)=3xyz(1,2)=4xyz(2,2)=5a=pythagoras3d(VARPTR(xyz(0,0)),VARPTR(rads(0)),10)MSGBOX STR\$( rads(2))' dim xx as double ptr =varptr(xyz(0,0)): print xx ' test: arrays in freebasic are major,minor' whereas in PowerBasic the order id minor,major.' answer should be 7.071067811865476'PRINT "end"'-------------------------------------END FUNCTION`
FreeBasic Version
Code: [Select]
`' PYTHAGORAS 3D' Charles E V Pegge' 13 June 2007'FreeBasic Version' does the equivalent of r=sqr(x^2+y^2+z^2) for arrays of floating point data' parameter a:  the pointer to an x y z array of coordinates in 64 bit floating point.' parameter r:  the pointer to an array for the result r.' parameter c:  the number of points to processfunction pythagoras3d( byval a as double ptr, byval r as double ptr, byval c as long) as longasm'---------------------------'-------------------------------------------------------------------- mov eax,[a]                ' eax will be our source index mov edx,[r]                ' edx will be our destination index mov ecx,[c]                ' ecx will be our down-counter repeats:                   ' DO   dec ecx                  ' if ecx was initially 0 then the loop is completely skipped   jl exits                 ' when ecx is -1 or less   fld  qword ptr [eax]     ' push the x coordinate onto the FPU register stack   add eax,8                ' advance eax for the next coordinate (CPU and FPU work in parallel)   fmul st(0),st(0)         ' square the x coordinate   fld qword ptr [eax]      ' push the y coordinate onto the fpu stack   add eax,8                ' advance the eax pointer again   fmul st(0),st(0)         ' square the y   faddp st(1),st(0)        ' add this to the squared x then pop the y to discard it   fld qword ptr [eax]      ' load the z coordinate onto the fpu stack   add eax,8                ' advance the pointer ready for the next xyz point   fmul st(0),st(0)         ' square the z   faddp st(1),st(0)        ' add this value to the others and pop to discard it.   fsqrt                    ' take the quare root of this sum of squares x,y,z   fstp qword ptr [edx]     ' store the result at the r pointer and pop it from the stack   add edx,8                ' advance the r pointer edx ready for the next result.  jmp repeats               ' repeat the loop exits:                     ' exit point mov [function], ecx        ' the value in our down counter should always be -1end asm                     ' done'---------------------------'--------------------------------------------------------------------end functiondim xyz(1000,2) as doubledim radius(1000) as doublexyz(2,0)=3xyz(2,1)=4xyz(2,2)=5print pythagoras3d(varptr(xyz(0,0)),varptr(radius(0)),10)print radius(2)' dim xx as double ptr =varptr(xyz(0,0)): print xx ' test: arrays in freebasic are major,minor' answer should be 7.071067811865476print "end"'-------------------------------------`
« Last Edit: June 13, 2007, 02:12:33 PM by Charles Pegge »