xcl_pipeline_workitems

Description

Pipeline a work item to improve latency and throughput. Work item pipelining is the extension of loop pipelining to the kernel work group. This is necessary for maximizing kernel throughput and performance.

Syntax

Place the attribute in the OpenCL source before the elements to pipeline:

__attribute__((xcl_pipeline_workitems))

Example 1

In order to handle thereqd_work_group_sizeattribute in the following example, SDAccel automatically inserts a loop nest to handle the three-dimensional characteristics of the ND range (3,1,1). As a result of the added loop nest, the execution profile of this kernel is like an unpipelined loop. Adding thexcl_pipeline_workitemsattribute adds concurrency and improves the throughput of the code.

kernel __attribute__ ((reqd_work_group_size(3,1,1))) void foo(...) { ... __attribute__((xcl_pipeline_workitems)) { int tid = get_global_id(0); op_Read(tid); op_Compute(tid); op_Write(tid); } ... }

Example 2

The following example adds the work-item pipeline to the appropriate elements of the kernel:

__kernel __attribute__ ((reqd_work_group_size(8, 8, 1))) void madd(__global int* a, __global int* b, __global int* output) { int rank = get_local_size(0); __local unsigned int bufa[64]; __local unsigned int bufb[64]; __attribute__((xcl_pipeline_workitems)) { int x = get_local_id(0); int y = get_local_id(1); bufa[x*rank + y] = a[x*rank + y]; bufb[x*rank + y] = b[x*rank + y]; } barrier(CLK_LOCAL_MEM_FENCE); __attribute__((xcl_pipeline_workitems)) { int index = get_local_id(1)*rank + get_local_id(0); output[index] = bufa[index] + bufb[index]; } }

See Also