Topological Optimization

This section focuses on the topological optimization. It looks at the attributes related to the rough layout and implementation of multiple compute units and their impact on performance.

Multiple Compute Units

Depending on available resources on the target device, multiple compute units of the same kernel (or different kernels) can be created to run in parallel, which improves the system processing time and throughput.

Different kernels are provided as separate.xofiles on the xocc link line. Multiple compute units of a kernel can be added by using the--nkoption:

xocc -l --nk 
Note:Each of the individual kernels will have to be individually driven by the host code.

Using Multiple DDR Banks

Acceleration cards supported inSDAccel™environment provide one, two, or four DDR banks, and up to 80 GB/s raw DDR bandwidth. For kernels moving large amount of data between the FPGA and the DDR,Xilinx®recommends that you direct theSDAccelcompiler and runtime library to use multiple DDR banks.

In addition to DDR banks, the host application can access PLRAM to transfer data directly to a kernel. This feature is enabled using thexocc --spoption with compatible platforms.

To take advantage of multiple DDR banks, you need to assign CL memory buffers to different banks in the host code as well as configure thexclbinfile to match the bank assignment inxocccommand line.

The following block diagram shows theGlobal Memory Two Banks Examplein “kernel_to_gmem” category onSDAccel Getting Started Exampleson GitHub that connects the input pointer to DDR bank 0 and output pointer to DDR bank 1.

Figure:Global Memory Two Banks Example

Assigning DDR Bank in Host Code

Bank assignment in host code is supported byXilinxvendor extension. The following code snippet shows the header file required, as well as assigning input, and output buffers to DDR bank 0 and bank 1 respectively:

#include  … int main(int argc, char** argv) { … cl_mem_ext_ptr_t inExt, outExt; // Declaring two extensions for both buffers inExt.flags = 0|XCL_MEM_TOPOLOGY; // Specify Bank0 Memory for input memory outExt.flags = 1|XCL_MEM_TOPOLOGY; // Specify Bank1 Memory for output Memory inExt.obj = 0 ; outExt.obj = 0; // Setting Obj and Param to Zero inExt.param = 0 ; outExt.param = 0; int err; //Allocate Buffer in Bank0 of Global Memory for Input Image using Xilinx Extension cl_mem buffer_inImage = clCreateBuffer(world.context, CL_MEM_READ_ONLY | CL_MEM_EXT_PTR_XILINX, image_size_bytes, &inExt, &err); if (err != CL_SUCCESS){ std::cout << "Error: Failed to allocate device Memory" << std::endl; return EXIT_FAILURE; } //Allocate Buffer in Bank1 of Global Memory for Input Image using Xilinx Extension cl_mem buffer_outImage = clCreateBuffer(world.context, CL_MEM_WRITE_ONLY | CL_MEM_EXT_PTR_XILINX, image_size_bytes, &outExt, NULL); if (err != CL_SUCCESS){ std::cout << "Error: Failed to allocate device Memory" << std::endl; return EXIT_FAILURE; } … }

cl_mem_ext_ptr_tis astructas defined below:

typedef struct{ unsigned flags; void *obj; void *param; } cl_mem_ext_ptr_t;
  • Valid values forflagsare:
    • XCL_MEM_DDR_BANK0
    • XCL_MEM_DDR_BANK1
    • XCL_MEM_DDR_BANK2
    • XCL_MEM_DDR_BANK3
    • | XCL_MEM_TOPOLOGY
      Note:The is determined by looking at the Memory Configuration section in the xxx.xclbin.infofile generated next to the xxx.xclbinfile. In the xxx.xclbin.infofile, the global memory (DDR, PLRAM, etc.) is listed with an index representing the .
  • objis the pointer to the associated host memory allocated for the CL memory buffer only ifCL_MEM_USE_HOST_PTRflag is passed toclCreateBufferAPI, otherwise set it to NULL.
  • paramis reserved for future use. Always assign it to 0 or NULL.

Assigning Global Memory for Kernel Code

Creating Multiple AXI Interfaces

OpenCL™kernels, C/C++ kernels, and RTL kernels have different methods for assigning function parameters to AXI interfaces.

  • ForOpenCLkernels, the--max_memory_portsoption is required to generate oneAXI4interface for each global pointer on the kernel argument. TheAXI4interface name is based on the order of the global pointers on the argument list.

    The following code is taken from the examplegmem_2banks_oclin thekernel_to_gmemcategory from theSDAccel Getting Started Exampleson GitHub:

    __kernel __attribute__ ((reqd_work_group_size(1, 1, 1))) void apply_watermark(__global const TYPE * __restrict input, __global TYPE * __restrict output, int width, int height) { ... }

    In this example, the first global pointerinputis assigned anAXI4nameM_AXI_GMEM0, and the second global pointeroutputis assigned a nameM_AXI_GMEM1.

  • For C/C++ kernels, multipleAXI4interfaces are generated by specifying different “bundle” names in the HLS INTERFACE pragma for different global pointers. Refer to theSDAccel Environment Programmers Guidefor more information.

    The following is a code snippet from the gmem_2banks_cexample that assigns the inputpointer to the bundle gmem0and the outputpointer to the bundle gmem1. The bundle name can be any valid C string, and the AXI4interface name generated will be M_AXI_. For this example, the input pointer will have AXI4interface name as M_AXI_gmem0, and the output pointer will have M_AXI_gmem1.
    #pragma HLS INTERFACE m_axi port=input offset=slave bundle=gmem0 #pragma HLS INTERFACE m_axi port=output offset=slave bundle=gmem1
  • For RTL kernels, the port names are generated during the import process by the RTL kernel wizard. The default names proposed by the RTL kernel wizard arem00_axiandm01_axi. If not changed, these names have to be used when assigning a DDR bank through the--spoption.

Assigning AXI Interfaces to DDR Banks

IMPORTANT:When using more than one DDR interface, Xilinxrequires you to specify the DDR memory bank for each kernel/CU using the --spoption, and specify in which SLR the kernel is placed. Refer to the XOCC command in the SDx Command and Utility Reference Guidefor details of the --spcommand option, and the SDAccel Environment User Guidefor details on SLR placement.

AXI4interfaces are connected to DDR banks using the--spoption. The--spoption value is in the format of.:. Valid DDR bank names for the--spoption are:

  • DDR[0]
  • DDR[1]
  • DDR[2]
  • DDR[3]

The following is the command line example that connects the input pointer (M_AXI_GMEM0) to DDR bank 0 and the output pointer (M_AXI_GMEM1) to DDR bank 1:

xocc --max_memory_ports apply_watermark --sp apply_watermark_1.m_axi_gmem0:DDR[0] --sp apply_watermark_1.m_axi_gmem1:DDR[1]

You can use the Device Hardware Transaction view to observe the actual DDR Bank communication, and to analyze DDR usage.

Figure:Device Hardware Transaction View Transactions on DDR Bank

Assigning AXI Interfaces to PLRAM

Some platforms support PLRAMs. In these cases, use the same--spoption as described inAssigning AXI Interfaces to DDR Banks, but use the name, PLRAM[id]. Valid names supported by specific platforms can be found in the Memory Configuration section of thexclibin.infofile generated alongsidexclbin.

Assigning Kernels to SLR regions

Assigning ports to DDR banks requires that the kernel will have to be physically routed on the FPGA to connect to the assigned DDR. Currently, large FPGAs use stacked silicon devices with several Super Logic Regions (SLRs). By default, theSDAccelenvironment will place the compute units in the same SLR as the shell. This might not always be desirable, especially when specific DDR banks are used that might be in another SLR region. As a result,Xilinxrecommends to use the--slroption to map kernels to be close to the used DDR memory. For example, theapply_watermark_1kernel above can be mapped to SLR 1 by applying the following link option:

xocc -l --slr apply_watermark_1:SLR1

To better understand the platform attributes, such as the number of DDRs and SLR regions, useplatforminfo. For more information, refer to theSDx Command and Utility Reference Guide(UG1279).