Assigning OpenCL parameters by name

In this article, I’ll explain to you a few ways of implementing “smart” parameter assignment to OpenCL kernels. I’ll not explain in detail what OpenCL is, just enough to give you the problem context.

OpenCL is a framework for implementing parallel algorithms. Currently, the only language you can implement these algorithms in is “OpenCL-C”, which is C99 with some additions (such as vector types) and restrictions (recursion is not allowed). So you’ve got your “host-side code” that runs on the CPU and is written in C, C++, Java or whatever, and your OpenCL-C programs that do the computing work. The OpenCL programs can run on the CPU, the GPU or some other parallel system.

In your OpenCL-C code, you define special functions called kernels that are callable from the outside. Kernels have no return value, but they otherwise act just like normal C functions. They take parameters and can call other normal C functions.

Let’s say we’ve got the following OpenCL kernel (global and kernel are new keywords):

kernel void add_buffers(
  global float const *source1,
  global float const *source2,
  global float *destination)
  destination[get_global_id(0)] =
    source1[get_global_id(0)] +

The kernel takes two float buffers, adds them together and stores the result in a third buffer. We save this program to a file called

In the C++ Code, we do something like this to start an OpenCL calculation (pseudocode):

// Create the three buffers. You would, of course, fill them
// with meaningful values. I'll skip that step, however.
cl::buffer first_buffer = cl::create_buffer(256 * sizeof(float));
cl::buffer second_buffer = cl::create_buffer(256 * sizeof(float));
cl::buffer third_buffer = cl::create_buffer(256 * sizeof(float));

// Load program from file and compile it
cl::program p = load_program_from_file("");

// Load a specific kernel from the program (kernels are separate objects!)
// You'll get an error if there is no kernel called "add_buffers"
cl::kernel foo = p.create_kernel("add_buffers");

// Set the kernel arguments to our pre-allocated buffers

// Create 256 instances of the program (one for each array element)

I hope the pseudocode is readable enough. The actual OpenCL-API is too verbose to be written down here.

The main problem with the code is the parameter passing to the kernel:


As you can see, you have to specify the arguments’ positions, not the names, which is extremely error-prone. You often copy&paste the parameter passing, you might end up with:


This compiles and runs just fine, it’ll just do The Wrong Thing.

Another thing to worry about is type-safety. If you accidentally assign a parameter of the wrong type, you either get a slightly cryptic error message (which is fine) or the code, again, compiles and does something strange.

To mitigate the first problem – the positional parameters – a few solutions exist:

  1. OpenCL-1.2 includes a function to get the name of a parameter from the index, see clGetKernelArgInfo. However, virtually no vendor ships with 1.2 yet, so that’s not a real solution.
  2. Use an index to assign parameters:

    int param = 0;

    maybe even put all of this inside a macro so you don’t forget to increment the index. Many people do it that way. It has obvious drawbacks, though.

  3. Parse the OpenCL-C code yourself and extract the parameters. OpenCL-C is “just” C99, so it’s not as hard to parse as, say, C++. But it’s still a huge amount of work, just to parse the function headers.
  4. Decorate the OpenCL-C code with easily-parseable macros.

I have implemented the last solution and am quite happy with it so far. The trick is to define macros in your OpenCL-C code (yes, OpenCL-C is preprocessed just like C99), like this:

#define KERNEL_NAME(name) name
#define KERNEL_ARGUMENT(name) name

Then, you write your kernel like this:

#define KERNEL_NAME(name) name
#define KERNEL_ARGUMENT(name) name

kernel void KERNEL_NAME(add_buffers)(
  global float const *KERNEL_ARGUMENT(source1),
  global float const *KERNEL_ARGUMENT(source2),
  global float *KERNEL_ARGUMENT(destination))
  destination[get_global_id(0)] =
    source1[get_global_id(0)] +

In your C++ code, you load and build your program as we saw above, no change here. The macros will expand to the parameter given, so they don’t disturb the parsing process.

Then, we read in the cl file and so some parsing. We extract the first kernel name by searching for KERNEL_NAME in the program. We create an empty vector of strings for the upcoming kernel parameters. We search for occurrences of KERNEL_ARGUMENT until we find the next KERNEL_NAME. Every argument we find we push to the end of the vector. For the next kernel name, we do the same.

Using this algorithm, you get a mapping from the kernel names to their arguments, and the arguments automatically get indices. We can now write:


Inside the kernel::argument function, we check if the given arguments exist and retrieve its index. Voila. 🙂

Furthermore, we could extend the KERNEL_ARGUMENT macro to include not only the name, but the type:

#define KERNEL_ARGUMENT(type,name) type name

// Example usage
KERNEL_ARGUMENT(global float2 *,foo)

And then we could do type checking, too. This is, of course, a lot more complicated.

This entry was posted in Uncategorized and tagged , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s