Dealing with lack of floating point precision in OpenCL particle system -
Dealing with lack of floating point precision in OpenCL particle system -
i'm writing opencl based particle scheme speed visualizations of big scale networks. in essence, 2 phase problem phase 1 applies negative gravity each particle (typical n-bodies problem) repel , phase 2 attracts particles based on edges (or springs) between particles.
during each iteration of gravity algorithm each particle's location, represented pair of floats, impacted distance each other particle (classical physics model, no drag, keeping simple).
in situation 1 has spaced out square array of particles application of gravity should result in symmetry across both x , y axes. true @ origin of gravity application, on time lack of precision inherent in adding lots of floating point numbers results in little non-uniform deviations. this, in turn propagates through entire n-body scheme , loss of symmetry occurs.
one simple way avoid utilize double precision numbers, geforce 9600m gt on macbook pro not back upwards double precision numbers. so, what's nice way deal such problems in opencl? i've thought truncating floating point numbers i'm adding few decimals avoid problem, seems bit hokey.
this pretty mutual problem; on cpus, want avoid utilize of double precisions avoid factor of 2 in memory overhead/bandwidth.
a number of molecular dynamics , n-body codes written gpus utilize "mixed-precision" arithmetic; store particle positions , velocities single precisions, utilize double precision few key operations -- typically store position differences, , accumulate accelerations. (googling "mixed precision" "molecular dynamics" or "n-body" gives tonnes of results).
so reduces number of double precision calculations, not zero. implement higher precision arithmetic hardware natively supports, can software emulation, emulating double 2 floats. there venerable fortran library dsfun90 implemented this, , in this nvidia forum implemented similar in cuda (based on operations in nvidia's mandelbrot example). don't know of opencl implementation offhand copying on cuda should pretty straightforward. it's not fast native doubles, if it's few key operations it's not bad.
floating-point opencl precision
Comments
Post a Comment