opencl - Writing to global memory in CUDA -

- March 15, 2014

i inquire effect of writing global memory in cuda. known global memory reads have great impact performance (coalescing, caches, bank conflicts) since may require quite lot of cycles wait incoming memory, may block execution @ moment.

however writing memory in cuda? suffer type of memory write pattern? total cost straightforwardly sum of writes in kernel?

any related references , comments appreciated.

in general reply question "yes", stores similar loads. difference since stores "fire , forget", if there work not depend on stored addresses can run multiprocessor(s) after issuing stores, , stalls happen when read-after-write dependencies encountered.

for total details, suggest reading section 5.3.2 of latest cuda programming guide.

also see appendix f of document specific info pertaining different architecture families. illustration compute capability 1.x has more performance "cliffs" compute capability 2.x (fermi) devices.

cuda opencl gpu gpgpu nvidia

Search This Blog

Kamlesh

opencl - Writing to global memory in CUDA -

Comments

Post a Comment

Popular posts from this blog

How do I check if an insert was successful with MySQLdb in Python? -

delphi - blogger via idHTTP : error 400 bad request -

postgresql - ERROR: operator is not unique: unknown + unknown -