reqd_work_group_size
Description
When OpenCL kernels are submitted for execution on an OpenCL device, they execute within an index space, called an ND range, which can have 1, 2, or 3 dimensions. This is called the global size in the OpenCL API. The work-group size defines the amount of the ND range that can be processed by a single invocation of a kernel compute unit. The work-group size is also called the local size in the OpenCL API. The OpenCL compiler can determine the work-group size based on the properties of the kernel and selected device. Once the work-group size (local size) has been determined, the ND range (global size) is divided automatically into work-groups, and the work-groups are scheduled for execution on the device.
Although the OpenCL compiler can define the work-group size, the specification of thereqd_work_group_size
attribute on the kernel to define the work-group size is highly recommended for FPGA implementations of the kernel. The attribute is recommended for performance optimization during the generation of the custom logic for a kernel.SeeOpenCL Execution Modelfor more information.
reqd_work_group_size
attribute is highly recommended as it can be used for performance optimization during the generation of the custom logic for a kernel.
OpenCL kernel functions are executed exactly one time for each point in the ND range index space. This unit of work for each point in the ND range is called a work-item. Work-items are organized into work-groups, which are the unit of work scheduled onto compute units. The optionalreqd_work_group_size
defines the work-group size of a compute unit that must be used as thelocal_work_size
argument toclEnqueueNDRangeKernel
. This allows the compiler to optimize the generated code appropriately for this kernel.
Syntax
Place this attribute before the kernel definition, or before the primary function specified for the kernel:__attribute__((reqd_work_group_size(X,Y,Z)))
- X,Y,Z: Specifies the ND range of the kernel. This represents each dimension of a three dimensional matrix specifying the size of the work-group for the kernel.
Examples
The following OpenCL API C kernel code shows a vector addition design where two arrays of data are summed into a third array. The required size of the work-group is 16x1x1. This kernel will execute 16 times to produce a valid result.
#include // For VHLS OpenCL C kernels, the full work group is synthesized __attribute__ ((reqd_work_group_size(16, 1, 1))) __kernel void vadd(__global int* a, __global int* b, __global int* c) { int idx = get_global_id(0); c[idx] = a[idx] + b[idx]; }
See Also
- SDAccel Environment Optimization Guide(UG1207)
- https://www.khronos.org/
- The OpenCL C Specification