Lab 5: Task-Level Pipelining
This lab demonstrates how to modify your code to optimize the hardware-software system generated by the SDx IDE using task-level pipelining. You can observe the impact of pipelining on performance.
#pragma_HLS_array_partition
which sets
block factor=16
; instead, set
block factor=8
).
Task Pipelining
If there are multiple calls to an accelerator in your application, then you can structure your application such that you can pipeline these calls and overlap the setup and data transfer with the accelerator computation. In the case of the matrix multiply application, the following events take place:
- Matrices A and B are transferred from the main memory to accelerator local memories.
- The accelerator executes.
- The result, C, is transferred back from the accelerator to the main memory.
The following figure illustrates the matrix multiply design on the left side and on the right side a time-chart of these events for two successive calls that are executing sequentially.
Figure:Sequential Execution of Matrix Multiply Calls
The following figure shows the two calls executing in a pipelined fashion. The data transfer for the second call starts as soon as the data transfer for the first call is finished and overlaps with the execution of the first call. To enable the pipelining, however, we need to provide extra local memory to store the second set of arguments while the accelerator is computing with the first set of arguments. The SDSoC environment generates these memories, calledmulti-buffers, under the guidance of the user.
Figure:Pipelined Execution of Matrix Multiply Calls
Specifying task level pipelining requires rewriting the calling code using the pragmasasync(id)andwait(id). The SDSoC environment includes an example that demonstrates the use ofasyncpragmas and this Matrix Multiply Pipelined example is used in this tutorial.
Learning Objectives
- Use the SDx IDE to optimize your application to reduce runtime by performing task-level pipelining.
- Observe the impact on performance of pipeline calls to an accelerator when overlapping accelerator computation with input and output communication.