cuda - Why is cudaMalloc giving me an error when I know there is sufficient memory space? -
cuda - Why is cudaMalloc giving me an error when I know there is sufficient memory space? -
i have tesla c2070 supposed have 5636554752 bytes of memory.
however, gives me error:
int *buf_d = null; err = cudamalloc((void **)&buf_d, 1000000000*sizeof(int)); if( err != cudasuccess) { printf("cuda error: %s\n", cudageterrorstring(err)); homecoming exit_error; } how possible? have maximum memory pitch? here gpu's specs:
device 0: "tesla c2070" cuda driver version: 3.20 cuda runtime version: 3.20 cuda capability major/minor version number: 2.0 total amount of global memory: 5636554752 bytes multiprocessors x cores/mp = cores: 14 (mp) x 32 (cores/mp) = 448 (cores) total amount of constant memory: 65536 bytes total amount of shared memory per block: 49152 bytes total number of registers available per block: 32768 warp size: 32 maximum number of threads per block: 1024 maximum sizes of each dimension of block: 1024 x 1024 x 64 maximum sizes of each dimension of grid: 65535 x 65535 x 1 maximum memory pitch: 2147483647 bytes as machine i'm running on, has 24 intel® xeon® processor x565, linux distribution rocks 5.4 (maverick).
any ideas? thanks!
the basic problem in question title - don't know have sufficient memory, assuming do. runtime api includes cudamemgetinfo function homecoming how much free memory there on device. when context established on device, driver must reserved space device code, local memory each thread, fifo buffers printf support, stack each thread, , heap in-kernel malloc/new calls (see this answer farther details). of can consume rather lot of memory, leaving much less maximum avialable memory after ecc reservations assuming available code. api includes cudadevicegetlimit can utilize query amounts of memory on device runtime back upwards consuming. there companion phone call cudadevicesetlimit can allow alter amount of memory each component of runtime back upwards reserve.
even after tuned runtime memory footprint tastes , have actual free memory value driver, there still page size granularity , fragmentation considerations contend with. possible allocate every byte of api study free. usually, when objective seek , allocate every available byte on card:
const size_t mb = 1<<20; // assuming 1mb page size here size_t available, total; cudamemgetinfo(&available, &total); int *buf_d = 0; size_t nwords = total / sizeof(int); size_t words_per_mb = mb / sizeof(int); while(cudamalloc((void**)&buf_d, nwords * sizeof(int)) == cudaerrormemoryallocation) { nwords -= words_per_mb; if( nwords < words_per_mb) { // signal no free memory break; } } // leaves int buf_d[nwords] on device or signals no free memory (note never been near compiler, safe on cuda 3 or later). implicitly assumed none of obvious sources of problems big allocations apply here (32 bit host operating system, wddm windows platform without tcc mode enabled, older known driver issues).
memory cuda
Comments
Post a Comment