opencl - Writing to global memory in CUDA -
opencl - Writing to global memory in CUDA -
i inquire effect of writing global memory in cuda. known global memory reads have great impact performance (coalescing, caches, bank conflicts) since may require quite lot of cycles wait incoming memory, may block execution @ moment.
however writing memory in cuda? suffer type of memory write pattern? total cost straightforwardly sum of writes in kernel?
any related references , comments appreciated.
in general reply question "yes", stores similar loads. difference since stores "fire , forget", if there work not depend on stored addresses can run multiprocessor(s) after issuing stores, , stalls happen when read-after-write dependencies encountered.
for total details, suggest reading section 5.3.2 of latest cuda programming guide.
also see appendix f of document specific info pertaining different architecture families. illustration compute capability 1.x has more performance "cliffs" compute capability 2.x (fermi) devices.
cuda opencl gpu gpgpu nvidia
Comments
Post a Comment