Introduction to Debugging inSDAccel

This document is intended to introduce the debugging capabilities of theSDAccel™environment. The goal is to provide detailed instructions on how to analyze any failure encountered within theSDAccelflow. If no tool problem is encountered and the behavior of the design is deemed functionally correct, look for answers in theSDAccel Environment Profiling and Optimization Guideto determine if the performance of the design can be further improved.

Execution Model of an SDAccel Application

TheSDAccelenvironment is designed to provide a simplified development experience for FPGA-based software acceleration platforms. The general structure of the acceleration platform is shown in the following figure.

Figure:Architecture of anSDAccelApplication



The custom application is running on the host x86 server and usesOpenCLAPI calls to interact with the FPGA accelerators. TheXilinxruntime (XRT) manages those interactions. The application is written in C/C++ usingOpenCLAPIs. The custom kernels are running within aXilinxFPGA with the XRT managing interactions between the host application and the accelerator. Communication between the host x86 machine and the accelerator board occurs across thePCIebus.

TheSDAccelhardware platform contains global memory banks. The data transfer between the host machine and kernels, in either direction, occurs through these global memory banks. The kernels running on the FPGA can have one or more memory interfaces. The connection from the memory banks to those memory interfaces is programmable and determined by linking options of the compiler.

TheSDAccelexecution model follows these steps:

  1. The host application writes the data needed by a kernel into the global memory of the attached device through thePCIeinterface.
  2. The host application programs the kernel with its input parameters.
  3. The host application triggers the execution of the kernel function on the FPGA.
  4. The kernel performs the required computation while reading and writing data from global memory, as necessary.
  5. The kernels write data back to the memory banks, and notify the host that it has completed its task.
  6. The host application reads data back from global memory into the host memory space, and continues processing as needed.

The FPGA can accommodate multiple kernel instances at one time; this can occur between different types of kernels or multiple instances of the same kernel. The XRT transparently orchestrates the communication between the host application and the kernels in the accelerator. The number of instances of a kernel is determined by compilation options.

SDAccel Build Process

TheSDAccelenvironment offers all of the features of a standard software development environment:

  • Optimized compiler for host applications
  • Cross-compilers for the FPGA
  • Robust debugging environment to help identify and resolve issues in the code
  • Performance profilers to identify bottlenecks and optimize the code

Within this environment, the build process uses a standard compilation and linking process for both the software elements, and the hardware elements of the project. As shown in the following figure, the host application is built through one process using standard GCC compiler, and the FPGA binary is built through a separate process using theXilinxxocccompiler.

Figure:Software/Hardware Build Process



  1. Host application build process using GCC:
    • Each host application source file is compiled to an object file (.o).
    • The object files (.o) are linked with theXilinxSDAccelruntime shared library to create the executable (.exe).
  2. FPGA build process is highlighted in the following figure:
    • Each kernel is independently compiled to aXilinxobject (.xo) file.
      • C/C++ andOpenCLC kernels are compiled for implementation on an FPGA using thexocccompiler. This step leverages theVivado®HLS compiler. Pragmas and attributes supported byVivadoHLS can be used in C/C++ andOpenCLC kernel source code to specify the desired kernel micro-architecture and control the result of the compilation process.
      • RTL kernels are compiled using thepackage_xoutility. The RTL kernel wizard in theSDAccelenvironment can be used to simplify this process.
    • The kernel.xofiles are linked with the hardware platform (shell) to create the FPGA binary (.xclbin). Important architectural aspects are determined during the link step. In particular, this is where connections from kernel ports to global memory banks are established and where the number of instances for each kernel is specified.
      • When the build target is software or hardware emulation, as described below,xoccgenerates simulation models of the device contents.
      • When the build target is the system (actual hardware),xoccgenerates the FPGA binary for the device leveraging theVivado Design Suiteto run synthesis and implementation.

Figure:FPGA Build Process



Note:The xocccompiler automatically uses the VivadoHLS and Vivado Design Suitetools to build the kernels to run on the FPGA platform. It uses these tools with predefined settings which have proven to provide good quality of results. Using the SDAccelenvironment and the xocccompiler does not require knowledge of these tools; however, hardware-savvy developers can fully leverage these tools and use all their available features to implement kernels.

Build Targets

TheSDAcceltool build process generates the host application executable (.exe) and the FPGA binary (.xclbin). TheSDAccelbuild target defines the nature of FPGA binary generated by the build process.

TheSDAcceltool provides three different build targets, two emulation targets used for debug and validation purposes, and the default hardware target used to generate the actual FPGA binary:

Software Emulation ( sw_emu)
Both the host application code and the kernel code are compiled to run on the x86 processor. This allows iterative algorithm refinement through fast build-and-run loops. This target is useful for identifying syntax errors, performing source-level debugging of the kernel code running together with application, and verifying the behavior of the system.
Hardware Emulation ( hw_emu)
The kernel code is compiled into a hardware model (RTL) which is run in a dedicated simulator. This build and run loop takes longer but provides a detailed, cycle-accurate, view of kernel activity. This target is useful for testing the functionality of the logic that will go in the FPGA and for getting initial performance estimates.
System ( hw)
The kernel code is compiled into a hardware model (RTL) and is then implemented on the FPGA device, resulting in a binary that will run on the actual FPGA.

SDAccel Debug Flow Overview

This section presents the general debug flow of theSDAccelenvironment by detailing the general steps of a proven development process. This process allows you to focus rapidly on potential errors in the design. This sets the baseline for developers indicating where to start if an error occurs in their adopted development steps.

The debug flow described here assumes that anSDAccelplatform board is installed and the initial setup checks have passed. It is possible to configure theSDAccelenvironment to work with custom hardware platforms that require a platform shell which defines the foundational components of the board.

TheSDAccelenvironment provides application-level debug features which allow the host code, the kernel code, and the interactions between them to be efficiently debugged. The recommended application-level debugging flow consists of three levels of debugging: software emulation, hardware emulation, and hardware execution.

This three-tiered approach allows debugging of the host and kernel code and their interactions at different levels of abstraction. Each of the execution models described below is supported through theSDAccelIDE as well as through a batch flow using basic compile time and runtime setup options.

Software Emulation

Purpose
Algorithm verification
Execution Model
During software emulation, all processes are running pure C/C++ models. OpenCLkernel models are transformed to execute concurrently.

Figure:Software Emulation

Verify that both the host and kernel code are functionally correct by running software emulation. Because software emulation compiles and executes quickly, spend time here to iterate through the code until the host and kernel code function correctly. Both hardware emulation and hardware execution take more time to compile and execute.

Hardware Emulation

Purpose
RTL debugging, finding protocol violations.
Execution Model
During hardware emulation the host code is executed concurrently with a simulation of the RTL model of the kernel, directly imported, or created through VivadoHLS from the C/C++/ OpenCLkernel code.

Figure:Hardware Emulation

Verify the host code and the kernel hardware implementation is correct by running hardware emulation on a data set. Hardware emulation performs detailed verification using an accurate model of the hardware (RTL) together with the host code C/OpenCLmodel. The hardware emulation flow invokes the hardware simulator in theSDAccelenvironment to test the functionality of the logic that is to be executed on the FPGA compute fabric. The interface between the models is represented by a transaction-level model (TLM) to limit impact of interface model on the overall execution time. The execution time for hardware emulation is longer than for software emulation.

TIP: Xilinxrecommends that you use small data sets for debug and validation.
During the hardware emulation stage you can optionally modify the kernel code to improve performance. Iterate in hardware emulation until the functionality is correct and the estimated kernel performance is sufficient. See SDAccel Environment Profiling and Optimization Guidefor more information.

Hardware Execution

Purpose
Final verification of the complete system, finding protocol violations (hardware hangs), and debugging system performance.
Execution Model
During hardware execution, the actual hardware platform is used to execute the kernels. The difference between this debug configuration and the final compilation of the kernel code is the inclusion of special hardware logic in the platform, such as ILA and VIO debug cores, and AXI performance monitors for debug purposes.

Figure:Hardware Execution

At this stage, a system image (xclbin) is compiled and executed on the actual hardware platform. Refer to theSDAccel Environment User Guidefor more information on generating thexclbinfile. At this point, the kernels are confirmed to be executing correctly on the actual FPGA hardware, and your focus can shift from debugging to performance tuning. See theSDAccel Environment Profiling and Optimization Guide.

Nevertheless, the hardware execution model might not be functional due to protocol issues, or issues with the hardware configuration. Towards that end, theSDAccelenvironment provides specific hardware debug capabilities which include ChipScope debug cores (such as System ILAs), which can be viewed inVivadohardware manager, with waveform analysis, kernel activity reports, and memory access analysis to localize these critical hardware issues.

IMPORTANT:Debugging the kernel on the platform hardware requires additional logic to be incorporated into the overall hardware model. This means that if hardware debugging is enabled, there is some impact on resource use of the FPGA, as well as some impact on the kernel performance.