Showing posts with label opencl. Show all posts
Showing posts with label opencl. Show all posts

Thursday, August 10, 2017

Implementing nested loop in OpenCL?

Leave a Comment

I'm new to OpenCL, been trying to implement a 3 level nested loop in Kernel function. Guess my understanding is not enough. Below is the C code of the logic

void scale(float *output, float *scales, int batch, int n, int size) {     int i,j,b;     for(b = 0; b < batch; ++b){         for(i = 0; i < n; ++i){             for(j = 0; j < size; ++j){                 output[(b*n+i)*size+j] *= scales[i];             }         }     } } 

Where output and scales are 1D arrays. Ex:

float output[18] = {1,2,3,4,5,6,7,8,9,1,2,3,4,5,6,7,8,9}; float scales[9] = {1,0,1,0,1,0,1,0,1};  int n = 9; int size = 2; int batch = 1; 

The expected output is Output:

1.000000  2.000000  0.000000  0.000000  5.000000  6.000000   0.000000  0.000000  9.000000  1.000000  0.000000  0.000000  4.000000  5.000000  0.000000  0.000000  8.000000  9.000000 

Below is my OpenCL kernel

__kernel void scale_kernel(__global float *output, __global float *biases, int n, int size) {     int j = get_global_id(0);     int i = get_group_id(1);     int b = get_group_id(2);      if(j < size) output[(b*n+i)*size + j] *= biases[i]; } 

I hope this implementation is correct and the way I'm launching the NDkernel is wrong. My BLOCK size is 16 (Think this is where my understanding is wrong).

size_t global_work_size[3] = {size-1)/BLOCK + 1, n, batch}; size_t local_work_size[3] = {BLOCK, 1, 1}; cl.error = clEnqueueNDRangeKernel(queue, kernel, 3, 0, global_work_size, local_work_size, 0, 0, NULL); 

EDIT 1:

Changing the global_work_size as below produces the expected output, I've set local_work_size as NULL in this case. This might not provide the best performance.

size_t global_work_size[3] = {size, n, batch};     cl.error = clEnqueueNDRangeKernel(queue, kernel, 3, 0, global_work_size, NULL, 0, 0, NULL); 

Please let me know how to choose global_work_size , local_work_size.

0 Answers

Read More

Sunday, April 24, 2016

Xcode refuses to build one of my OpenCL projects but builds another one successfully

Leave a Comment

I've got two projects in Xcode, both of them use OpenCL and cl.hpp - OpenCL wrappers for C++.

I'm on Mac OS 10.11.4, using clang-703.0.29 version 7.3.0 and the latest (and pretty bizarre) version of Xcode (Version 7.3 (7D175)).

The first project compiles and builds very well. The result of a build is a static library (.a file). The second one uses this library (I'm just copying & pasting the lib and the headers into this project's directory). I'm also linking OpenCL.framework with this project.

The problem is, the second project doesn't build. It says:

CGLTypes.h - Missing ',' between enumerators

This error is on line 75:

kCGLPFAStereo OPENGL_ENUM_DEPRECATED(10_0, 10_11)        =   6, 

It is the only error I'm getting. This happens when cl.hpp includes OpenCL/opencl.h which includes OpenCL/cl_gl_ext.h with #include <OpenGL/CGLTypes.h> in it.

The Base SDK is set to Latest (OS X 10.11). Exactly the same problem has occurred here, but it has been resolved by an OS update. My Mac OS version is already the latest, so I can't do this.

To sum up, the problem is that two projects use the same version of OpenCL, are built on the same machine with the same settings, the same compiler, etc, but one of them doesn't compile.

Edit: here's a link to the first project: Matrix on GitHub. I'll try to add the second one as soon as possible. In fact, it's an ANN that uses Matrix to do operations with matrices. The only thing I do is I include cl.hpp and all the .hpps from Matrix. I'm also trying to link with the .a Matrix library, but the build process doesn't even get to this phase.

What should I do to fix that?

1 Answers

Answers 1

Well, the problem was that the second ('broken') project was using the old version of Xcode tools (CreatedOnToolsVersion = 7.1) for some reason. The first one was built with CreatedOnToolsVersion = 7.3;.

If I build Matrix without OpenCL support and link the second project with the generated .a library, everything works fine, so the problem was clearly with OpenCL and different CreatedOnToolsVersion settings.

The problem was solved by creating a new project and copying the files there.

Special thanks to @Yakk for their suggestion in the comments!

Read More