Topological Optimization
This chapter focuses on the topological optimization, and looks at the attributes related to the rough layout and implementation of multiple compute units and their impact on performance.
Multiple Compute Units
Depending on available resources on the FPGA, multiple compute units of the same kernel (or different kernels) can be created to run in parallel, which improves the system processing time and throughput.
Different kernels are provided as separate .xo files on thexocc
link line. Multiple kernel compute units can be added by using the--nk
option:
xocc -l --nk
Using Multiple DDR Banks
In theSDAccel™environment, supported acceleration cards provide one, two, or four DDR banks and up to 80 GB/s raw DDR bandwidth.
In addition to DDR banks, the host application can access PLRAM to transfer data directly to a kernel. This feature is enabled using thexocc --sp
option with compatible platforms.
To take advantage of multiple DDR banks or PLRAMs, use the--sp
option to map the individual arguments of the accelerator to the desired DDR banks or PLRAM in the xclbin. This mapping will be automatically picked up by the host executable.
The following block diagram shows theGlobal Memory Two Banks Examplefrom the “kernel_to_gmem” category onSDAccel Getting Started Exampleson GitHub that connects the input pointer to DDR bank 0 and output pointer to DDR bank 1.
Figure:Global Memory Two Banks
Connecting Kernel Ports to Memory Banks
Creating Multiple AXI Interfaces
OpenCL™kernels, C/C++ kernels, and RTL kernels have different methods for assigning function parameters to AXI interfaces.
ForOpenCLkernels, the
--max_memory_ports
option is required to generate oneAXI4interface for each global pointer on the kernel argument. TheAXI4interface name is based on the order of the global pointers on the argument list.The following code is taken from the examplegmem_2banks_oclin thekernel_to_gmemcategory from theSDAccel Getting Started Exampleson GitHub:
__kernel __attribute__ ((reqd_work_group_size(1, 1, 1))) void apply_watermark(__global const TYPE * __restrict input, __global TYPE * __restrict output, int width, int height) { ... }
In this example, the first global pointer
input
is assigned anAXI4namexi_gmem0
, and the second global pointeroutput
is assigned a nameaxi_gmem1
.For C/C++ kernels, multipleAXI4interfaces are generated by specifying different “bundle” names in the HLS INTERFACE pragma for different global pointers. For more information, refer to theSDAccel Environment Programmers Guide.
The following is a code snippet from the gmem_2banks_cexample that assigns theinput
pointer to the bundlegmem0
and theoutput
pointer to the bundlegmem1
. The bundle name can be any valid C string, and the AXI4interface name generated will bem_axi_
. For this example, the input pointer will have AXI4interface name asaxi_gmem0
, and the output pointer will havem_axi_gmem1
.#pragma HLS INTERFACE m_axi port=input offset=slave bundle=gmem0 #pragma HLS INTERFACE m_axi port=output offset=slave bundle=gmem1
- For RTL kernels, the port names are generated during the import process by the RTL kernel wizard. The default names proposed by the RTL kernel wizard are
m00_axi
andm01_axi
. If not changed, these names have to be used when assigning a DDR bank through the--sp
option.
Assigning AXI Interfaces to Global Memory
--sp
option, and specify in which SLR the kernel is placed. Refer to the XOCC command in the
SDx Command and Utility Reference Guidefor details of the
--sp
command option, and the
SDAccel Environment User Guidefor details on SLR placement.
AXI4interfaces are connected to DDR banks using the--sp
option. The--sp
option value is in the format of
.
The complete list of DDR memories or alternative communication memories, such as PLRAM and HBM, can be found through theplatforminfo
command.
The following is the command line example that connects the input pointer (M_AXI_GMEM0
) to DDR bank 0 and the output pointer (M_AXI_GMEM1
) to DDR bank 1:
xocc --max_memory_ports apply_watermark --sp apply_watermark_1.m_axi_gmem0:DDR[0] --sp apply_watermark_1.m_axi_gmem1:DDR[1]
You can use the Device Hardware Transaction view to observe the actual DDR Bank communication, and to analyze DDR usage.
Figure:Device Hardware Transaction View Transactions on DDR Bank
Assigning Kernels to SLR regions
Assigning ports to DDR banks requires that the kernel will have to be physically routed on the FPGA to connect to the assigned DDR. Currently, large FPGAs use stacked silicon devices with several Super Logic Regions (SLRs). By default, theSDAccelenvironment will place the compute units in the same SLR as the shell. This might not always be desirable, especially when specific DDR banks are used that might be in another SLR region. As a result,Xilinxrecommends to use the--slr
option to map kernels to be close to the used DDR memory. For example, theapply_watermark_1
kernel can be mapped to SLR 1 by applying the following link option:
xocc -l --slr apply_watermark_1:SLR1
To better understand the platform attributes, such as the number of DDRs and SLR regions, useplatforminfo
. For more information, refer to theSDx Command and Utility Reference Guide(UG1279).