debugging - Image convolution kernel does not function according to specified parameters, need help finding the cause -



debugging - Image convolution kernel does not function according to specified parameters, need help finding the cause -

i working on opencl image convolution. looked kernel code sample @ amd website, included original sample here:

__kernel void convolve_unroll(const __global float * pinput, __constant float * pfilter, __global float * poutput, const int ninwidth, const int nfilterwidth) { const int nwidth = get_global_size(0); const int xout = get_global_id(0); const int yout = get_global_id(1); const int xintopleft = xout; const int yintopleft = yout; float sum = 0; (int r = 0; r < nfilterwidth; r++) { const int idxftmp = r * nfilterwidth; const int yin = yintopleft + r; const int idxintmp = yin * ninwidth + xintopleft; int c = 0; while (c <= nfilterwidth-4) { int idxf = idxftmp + c; int idxin = idxintmp + c; sum += pfilter[idxf]*pinput[idxin]; idxf++; idxin++; sum += pfilter[idxf]*pinput[idxin]; idxf++; idxin++; sum += pfilter[idxf]*pinput[idxin]; idxf++; idxin++; sum += pfilter[idxf]*pinput[idxin]; c += 4; } (int c1 = c; c1 < nfilterwidth; c1++) { const int idxf = idxftmp + c1; const int idxin = idxintmp + c1; sum += pfilter[idxf]*pinput[idxin]; } } //for (int r = 0... const int idxout = yout * nwidth + xout; poutput[idxout] = sum; }

to help explain kind of "work" context i'm making kernel need know there 3rd party software involved acts host side. host side overlaps opencl api functions uses device side in same way. error i'm getting << [cl_readbuffer] clenqueuereadbuffer (-30) cl_invalid_value, looked , found: cl_invalid_value if part beingness read specified (offset, cb) out of bounds or if ptr null value.

this error occurs when seek read buffer contains output image after applying filter on original (3x3 gaussian):

1 4 1 (mask /matrix)

4 12 4

1 4 1

experience reference frame problem i'm learning c/c++ (inexperienced) , ever built simple kernel that's not far 1 liner still worked fine. need help in understanding i'm not doing right , understand/read/interpret opencl error messages , problems. if part beingness read specified (offset, cb) out of bounds mean?

moving problem other hardware hardware i'm working on ati 4xxx series card (byte adressable store not supported! slow results) have access higher end hardware need time set tests because have 3rd party software moving around company i'm working @ may not like

here's own code , changed names help show think does, please help me right wrong in learning/understanding of amd openc sample:

#ifdef cl_khr_fp64 #pragma opencl extension cl_khr_fp64 : enable #else #ifdef cl_amd_fp64 #pragma opencl extension cl_amd_fp64: enable #define cl_khr_fp64 #endif #endif kernel void defaultconvolution ( const global read_only int* inputbuffer, const global read_only int* filter, global int* outputimage, const int read_only inputimage_height, const int read_only filtermask_width, const int read_only filtermask_height ) { const int inputimage_width = get_global_size(0); // 2d workspace / ndrange (square shaped) const unsigned int x_id = get_global_id(0); const unsigned int y_id = get_global_id(1); const int inputimage_topleft_x = x_id; const int inputimage_topleft_y = y_id; float sum = 0; //for each row for(int rowcounter = 0; rowcounter < filtermask_width; rowcounter++) { //base id double loop counting const int filter_id_x_temp = rowcounter * filtermask_width; const int inputimage_y = inputimage_topleft_y + rowcounter; const int inputimage_id_temp = inputimage_y * inputimage_width + inputimage_topleft_x; //for each column for(int columncounter = 0; columncounter < filtermask_width; columncounter++) { const int filter_id_x = filter_id_x_temp + columncounter; const int inputimage_id = inputimage_id_temp + columncounter; sum += filter[filter_id_x] * inputbuffer[inputimage_id_temp]; } } const int outputimage_id_x = y_id * inputimage_width + x_id; outputimage[outputimage_id_x] = sum; }

edit 20-01-2012: take reply if can provide proper means learn develop kernels without having expert readily avaiable (normally compiler errors = ability larn mistakes, kernels maintain getting locked downwards , not learning anymore right now)

debugging opencl gpu amd-processor

Comments

Popular posts from this blog

How do I check if an insert was successful with MySQLdb in Python? -

delphi - blogger via idHTTP : error 400 bad request -

postgresql - ERROR: operator is not unique: unknown + unknown -