Debug Techniques

This section closely examines different styles of debugging techniques. It classifies the different approaches into software-based debugging techniques and hardware-oriented techniques. In the software-based approaches, you are not required to fully understand the ultimate mapping of the kernel code onto the FPGA. However, this concept can only be extended to a certain amount of detail, at which point the more detailed hardware-based analysis is required.

The section is structured along the different debug stages in theSDAccel™environment. It starts with functional verification during software emulation (a purely software-based approach). Next is hardware emulation, where the kernel code is converted into actual hardware representation providing more details of the final implementation. Hardware debugging as well as software debugging concepts can be applied during debugging in the hardware emulation stage. The last stage is system verification, where the actual hardware is executed. In this stage, software debugging concepts can only be applied to the host while the kernel must deploy hardware debugging concepts.

Functional Verification (Software Emulation)

Functional verification is the process during which the software representing the system is verified towards the ultimate implementation goal by ensuring that the software behaves as intended on the given data. This is a very common task during software development and many different concepts are available.

If your software does not perform as intended, you can use the debugger to identify the root cause of the issue, or if necessary, dump datapoints during software execution. This section introduces these concepts applied to anSDx™environment project.

Using printf() to Debug Kernels

The simplest approach to debugging algorithms is to verify key data values throughout the execution of the program. For application developers, printing checkpoint values in the code is a tried and trusted way of identifying problems within the execution of a program. Because part of the algorithm is now running on an FPGA, even this debugging technique requires additional support.

TheSDAcceldevelopment environment supports theOpenCL™printf()built-in function within the kernels in all development flows: software emulation, hardware emulation, and running the kernel in actual hardware. The following is an example of usingprintf()in the kernel, and the output when the kernel is executed withglobalsize of 8:

__kernel __attribute__ ((reqd_work_group_size(1, 1, 1))) void hello_world(__global int *a) { int idx = get_global_id(0); printf("Hello world from work item %d\n", idx); a[idx] = idx; }

The output is as follows:

Hello world from work item 0 Hello world from work item 1 Hello world from work item 2 Hello world from work item 3 Hello world from work item 4 Hello world from work item 5 Hello world from work item 6 Hello world from work item 7

IMPORTANT: printf()messages are buffered in the global memory and unloaded when kernel execution is completed. If printf()is used in multiple kernels, the order of the messages from each kernel display on the host terminal is not certain. Please note, especially when running in hardware emulation and hardware, the hardware buffer size might limit printfoutput capturing.

Note:This feature is only supported for OpenCLkernels in all development flows.

For C/C++ kernel models printf()is only supported during software emulation and should be excluded from the Vivado®HLS synthesis step. In this case, any printf()statement should be surrounded by the following compiler macros:

#ifndef __SYNTHESIS__ printf("text"); #endif

GDB-Based Debugging

This section shows how host and kernel debugging can be performed with the help of GDB. Because this flow should be familiar to software developers, this section focuses on the extensions of host code debugging capabilities specifically for FPGAs, and the current status of kernel-based hardware emulation support.

Host Code Debugging

Except for the method of launching the debugging environment described in the previous chapter, there is no difference between theSDAccelhost code debugging and the commonly used GDB application debugging flow and features.

Aftergdbis launched, you can step through the host code in GDB and examine the C/C++/OpenCLobjects to verify that their contents are as expected at any point in the code.

However, as stated in the introduction especially in the case of hardware emulation, it is common to look for issues regarding protocol synchronization between the host and the kernel. TheSDAccelenvironment provides special GDB extensions to examine the content of theOpenCLruntime environment from the application host. These commands are described in more detail in the next section.

Xilinx OpenCL Runtime GDB Extensions

The Xilinx OpenCLruntime Debug Environment introduces new GDB commands that give visibility from the host application into the OpenCLruntime library.

Note:If you run GDB outside of the SDAccelenvironment, these commands need to be enabled as described in Launching GDB Host Code Debug.

There are two kinds of commands which can be called from thegdbcommand line:

Commands that give visibility into theOpenCLruntime data structures (cl_command_queue,cl_event, andcl_mem). The arguments toxprint queueandxprint memare optional. The application debug environment keeps track of all theOpenCLobjects and automatically prints all valid queues andcl_memobjects if the argument is not specified. In addition, the commands do a proper validation of supplied commandqueue,event, andcl_memarguments.
```
xprint queue [] xprint event  xprint mem [] xprint kernel xprint all
```
Commands that give visibility into the IP on theSDAccelplatform. This functionality is only available in the system flow (hardware execution) and not in any of the emulation flows.
```
xstatus all xstatus --
```

You can get help information about the commands by usinghelp .

A typical example for using these commands is if you are seeing the host application hang. In this case, the host application is likely to be waiting for the command queue to finish or waiting on an event list. Printing the command queue using thexprintcommand can tell you what events are unfinished, letting you analyze the dependencies between the events.

The output of both of these commands is automatically tracked when debugging with theSDAccelIDE. In this case three tabs are provided next to the common tabs for Variables, Breakpoints, and Registers in the left upper corner of the debug perspective. These are labeled Command Queue, Memory Buffers, and Platform Debug, showing the output ofxprint queue,xprint mem, andxstatusrespectively.

Note:The information presented in these views is only visible to the application developer while actually debugging the host code. This is the reason why this debug technique is also applicable when actual system execution (hardware) is performed.

GDB Kernel-Based Debugging

GDB kernel debugging is supported for the software emulation and hardware emulation flows. When the GDB executable is connected to the kernel in the IDE or command line flows, you can set breakpoints and query the content of variables in the kernel, similar to normal host code debugging. This is fully supported in the software emulation flow because the kernel GDB processes attach to the spawned software processes.

However, during hardware emulation, the kernel source code is transformed into RTL, created byVivadoHLS, and executed. As the RTL model is simulated, all transformations for performance optimization and concurrent hardware execution are applied. For that reason, not all C/C++/OpenCLlines can be uniquely mapped to the RTL code, and only limited breakpoints are supported and at only specific variables can be queried. Today, the GDB tool therefore breaks on the next possible line based on requested breakpoint statements and clearly states if variables can not be queried based on the RTL transformations.

Debugging in Hardware Emulation

During hardware emulation, it is possible to deep dive into the implementation of the kernels. TheSDAccelenvironment allows you to perform typical hardware-like debugging in this mode as well as some software-like GDB-based analysis on the hardware implementation.

GDB-Based Debugging

Debugging using a software-based GDB flow is fully supported during hardware emulation. Except for the execution of the actual RTL code representing the kernel code, there is no difference to the user because the GDB flow maps the RTL back into the source code description. This limits the breakpoint and observability of the variables in some cases, because during the RTL generation (HLS), variables and loops might have been dissolved.

For a detailed description of the debug feature itself please see the description in theSDAccel Debug Featureschapter, and the extensions to GDB as presented in theGDB-Based Debuggingsection.

Waveform-Based Kernel Debugging

The C/C++ andOpenCLkernel code is synthesized usingVivadoHigh Level Synthesis (HLS) to transform it into a Hardware Description Language (HDL) and later implement it onto the FPGA (xclbin).

Another debugging approach is based on simulation waveforms. Hardware-centric algorithm programmers are likely to be familiar with this approach. This waveform-based HDL debugging is best supported by theSDAccelenvironment through the IDE flow during hardware emulation.

TIP:For most debugging, the HDL model does not need to be analyzed. Waveform debugging is considered an advanced debugging capability.

Run the Waveform-Based Kernel Debugging Flow

Start theSDxenvironment, and perform the regular setup.
SelectRun>Debug Configurationsto open the Debug Configurations.
On the Debug Configurations window, select the current launch configuration from theOpenCLlist, as shown in the following figure.
On the Main tab, two kernel debug options are displayed. Select bothUse RTL waveform for kernel debuggingandLaunch live waveform, and close the configuration window. A debug session starts automatically. Selecting the Use RTL waveform for kernel debugging option ensures that a simulation waveform database is generated, while the Launch live waveform option spawns the Waveform viewer during the actual simulation, allowing you full control of the simulation engines and waveform display.
If the live waveform viewer is activated, the waveform viewer automatically opens when running the executable. By default, the waveform viewer shows all interface signals and the following debug hierarchy:
- Memory Data Transfers: Shows data transfers from all compute units funnel through these interfaces.
  TIP:These interfaces could be a different bit width from the compute units. If so, then the burst lengths would be different. For example, a burst of sixteen 32-bit words at a compute unit would be a burst of one 512-bit word at the OCL master.
- Kernel
  Compute Unit
  CU Stalls (%): This section shows a summary of stalls for the entire compute unit (CU). A bus of all lowest-level stall signals is created, and the bus is represented in the waveform as a percentage (%) of those signals that are active at any point in time.
  
  Data Transfers: This section shows the data transfers for all AXI masters on the CU.
  
  User Functions: This section lists all of the functions within the hierarchy of the CU.
  
  Function:
  
  Dataflow/Pipeline Activity: This section shows the function-level loop dataflow/pipeline signals for a CU.
  
  Function Stalls: This section lists the three stall signals within this function.
  
  Function I/O: This section lists the I/O for the function. These I/O are of protocol-m_axi,ap_fifo,ap_memory, orap_none.
TIP:As with any waveform debugger, additional debug data of internal signals can be added by selecting the instance of interest from the scope menu and the signals of interest from the object menu. Similarly, debug controls such as HDL breakpoints, as well as HDL code lookup and waveform markers are supported. Refer to the Vivado Design Suite User Guide: Logic Simulation(UG900)for more information on working with the waveform viewer.

Enable Waveform Debugging through the XOCC Command Line

The waveform debugging process can also be enabled through the XOCC command line. Use the following instructions to enable it:

Turn on debug code generation during kernel compilation.
```
xocc -g ...
```

Create ansdaccel.inifile in the same directory as the host executable with the contents below:

[Emulation] launch_waveform=batch [Debug] profile=true timeline_trace=true data_transfer_trace=fine

Execute hardware emulation. The hardware transaction data is collected in the file named--.wdb. This file can directly be opened through theSDAccelIDE.
TIP:If the launch_waveformoption is set to guiin the emulation section: [Emulation] launch_waveform=gui, a live waveform viewer is spawned during the execution of the hardware emulation.

System Verification and Hardware Debug

Application Hangs

This section discusses debugging issues related to the interaction of the host code and the accelerated kernels. Problems with these interactions manifest as issues such as machine hangs or application hangs. Although the GDB debug environment might help with isolating the errors in some cases (xprint), such as hangs associated with specific kernels, these issues are best debugged using thedmesgandxbutilcommands as shown here.

If the process of hardware debugging does not resolve the problem, it is necessary to perform hardware debugging using theChipScope™feature.

AXI Firewall Trips

The AXI firewall should prevent host hangs. This is why Xilinxrecommends the AXI Protocol Firewall IP to be included in SDAccelenvironment platforms. When the firewall trips, one of the first checks you perform should be to see if the host code and kernels are set up to use the same memory banks. The following steps detail one of the simplest methods to perform this check.

Usexbutilto program the FPGA:
```
xbutil program -p 
```
Run thexbutilquery option to check memory topology:
```
xbutil query
```
In the following example, there is no memory bank associated with the kernels:
If the host code expects any DDR banks/PLRAMs to be used, this report should indicate an issue. In this case, it is necessary to check kernel and host code expectations. If the host code is using theXilinxOpenCLextensions, it is necessary to check which DDR banks should be used by the kernel. These should match thexocc -sparguments provided.

Kernel Hangs due to AXI Violations

It is possible for the kernels to hang due to bad AXI transactions between the kernels and the memory controller. To debug these issues, it is required to instrument the kernels.

TheSDAccelenvironment provides two options for instrumentation to be applied during XOCC linking (-l). Both of these add hardware to your implementation, and based on utilization it might be necessary to limit instrumentation.
1. Add Lightweight AXI Protocol Checkers (lapc). These protocol checkers are added using the-–dkoption. The following syntax is used:
```
--dk <[protocol|list_ports]<:compute_unit_name><:interface_name>>
```
  In general, the is optional. If not specified, all ports are expected to be analyzed. The protocoloption is used to define the protocol checkers to be inserted. This option can accept a special keyword, all, for and/or . The list_portsoption generates a list of valid compute units and port combinations in the current design.
  Note:Multiple --dkoption switches can be specified in a single command line to additively add interface monitoring capability.
2. AddingSDxenvironment Performance Monitors (spm) enables the listing of detailed communication statistics (counters). Although this is most useful for performance analysis, it provides insight during debugging on pending port activities. The Performance Monitors are added using theprofile_kerneloption. The basic syntax forprofile_kerneloption is:
```
--profile_kernel data::::
```
  Three fields are required to determine the precise interface to which the performance monitor is applied. However, if resource use is not an issue, the keywordallenables you to apply the monitoring to all existing kernels, compute units, and interfaces with a single option. Otherwise, you can specify thekernel_name,cu_name, andinterface_nameexplicitly to limit instrumentation.
  The last option, , allows you to restrict the information gathering to just countersfor large designs, while all(default) includes the collection of actual trace information.
  Note:Multiple --profile_kerneloption switches can be specified in a single command line to additively add performance monitoring capability.
```
--profile_kernel data:kernel1:cu1:m_axi_gmem0 --profile_kernel data:kernel1:cu1:m_axi_gmem1 --profile_kernel data:kernel2:cu2:m_axi_gmem
```
When the application is rebuilt, rerun the host application using thexclbinwith the added SPM IP and LAPC IP.
When the application hangs, you can usexbutil statusto check for any errors or anomalies.
Check the SPM output:
- Runxbutil status --spma couple of times to check if any counters are moving. If they are moving then the kernels are active.
  TIP:Testing SPM output is also supported through GDB debugging using the command extension xstatus spm.
- If the counters are stagnant, the outstanding counts greater than zero might mean some AXI transactions are hung.
Check the LAPC output:
- Runxbutil status --lapcto check if there are any AXI violations.
  TIP:Testing LAPC output is also supported through GDB debugging using the command extension xstatus lapc.
- If there are any AXI violations, it implies that there are problems in the kernel implementation.

Host Application Hangs when Accessing Memory

Application hangs can also be caused by incomplete DMA transfers initiated from the host code. This does not necessarily mean that the host code is wrong; it might also be that the kernels have issued illegal transactions and locked up the AXI.

If the platform has an AXI firewall, such as in theSDAccelplatforms, it is likely to trip. The driver issues aSIGBUSerror, kills the application, and resets the device. You can check this by runningxbutil query. The following figure shows such an error in the firewall status:

TIP:If the firewall has not tripped, the Linux tool, dmesg, can provide additional insight.
When you know that the firewall has tripped, it is important to determine the cause of the DMA timeout. The issue could be an illegal DMA transfer, or kernel misbehavior. However, a side effect of the AXI firewall tripping is that the health check functionality in the driver resets the board after killing the application; any information on the device that might help with debugging the root cause is lost. To debug this problem, you can disable the health check thread in thexclmgmtkernel module to capture the error. This uses common Unix kernel tools in the following sequence:
1. sudo modinfo xclmgmt: This command lists the current configuration of the module and indicates if thehealth_checkparameter is on or off. It also returns the path to thexclmgmtmodule.
2. sudo rmmod xclmgmt: This removes and therefore disables thexclmgmtkernel module.
3. sudo insmod /xclmgmt.ko health_check=0: This reinstalls thexclmgmtkernel module with the health check disabled.
  TIP:The path to this module is reported in the output of the call to modinfo.
With the health check disabled, rerun the application. You can use the kernel instrumentation to isolate this issue as previously described.

Typical Errors Leading to Application Hangs

The user errors that typically create application hangs are listed below:

Read-before-write in 5.0+ shells causes an MIG ECC (Memory Interface Generator error correction code) error. This is typically a user error. For example, this error might occur when a kernel is expected to write 4KB of data in DDR, but it produces only 1KB of data, and you then try to transfer the full 4KB of data to the host. It can also happen if you supply a 1KB buffer to a kernel, but the kernel tries to read 4KB of data.
An ECC read-before-write error also occurs if no data has been written to a memory location since the last bitstream download which results in MIG initialization, but a read request is made for that same memory location. ECC errors stall the affected MIG because kernels are usually not able to handle this error. This can manifest in two different ways:
1. The CU might hang or stall because it cannot handle this error while reading or writing to or from the affected MIG. Thexbutilquery shows that the CU is stuck in aBUSYstate and is not making progress.
2. The AXI Firewall might trip if aPCIe®DMA request is made to the affected MIG, because the DMA engine is unable to complete the request. AXI Firewall trips result in the Linux kernel driver killing all processes which have opened the device node with theSIGBUSsignal. Thexbutilquery shows if an AXI Firewall has indeed tripped, and includes a timestamp.
If the above hang does not occur, the host code might not read back the correct data. This incorrect data is typically 0s, and is located in the last part of the data. It is important to review the host code carefully. One common example is compression, where the size of the compressed data is not known up front, and an application might try to migrate more data to the host than was produced by the kernel.

Debugging with ChipScope

You can use the ChipScope debugging environment and the Vivadohardware manager to help you debug your host application and kernels quickly and more effectively. In order to do this, at least one of the following must be true:

YourSDAccelapplication project has been instrumented with debug cores, using the--dkcompiler switch (as described inHardware Debugging Using ChipScope).
The RTL kernels used in your project must have been instantiated with debug cores (as described inAdding Debug IP to RTL Kernels).

These tools enable a wide range of capabilities from logic to system level debug while your kernel is running in hardware.

Note:Debugging on the kernel platform requires additional logic to be incorporated into the overall hardware model, which might have an impact on resource use and kernel performance.

Running XVC and HW Servers

The following steps are required to run the XVC ( XilinxVirtual Cable) and HW servers, host applications, and finally trigger and arm the debug cores in Vivadohardware manager.

Add debug IP to the kernel.
Instrument the host application to pause at appropriate point in the host execution where you want to debug. SeeDebugging through the Host Application.
Set up the environment for hardware debug. You can do this manually, or by using a script that automates this for you. The following steps are described inManual Setup for Hardware DebugandAutomated Setup for Hardware Debug:
1. Run the required XVC and HW servers.
2. Execute the host application and pause at the appropriate point in the host execution to enable setup of ILA triggers.
3. OpenVivadohardware manager and connect to the XVC server.
4. Set up ILA trigger conditions for the design.
5. Continue with host application.
6. Inspect results in theVivadohardware manager.
7. Rerun iteratively from step b (above) as required.

Adding Debug IP to RTL Kernels

IMPORTANT:This debug technique requires familiarity with the Vivado Design Suite, and RTL design.

You need to instantiate debug cores like the Integrated Logic Analyzer (ILA)and Virtual Input/Output(VIO) in your RTL kernel code to debug the kernel logic. From within the Vivado Design Suite, edit the RTL kernel to instantiate an ILA IP customization, or a VIO IP, into the RTL code, similar to using any other IP in VivadoIDE. Refer to the Vivado Design Suite User Guide: Programming and Debugging(UG908)to learn more about using the ILA or other debug cores in the RTL Insertion flow and to learn about using the HDL generate statement technique to enable/disable debug core generation.

TIP:The best time to add debug cores to your RTL kernel is when you create it. Refer to the Debugging section in the UltraFast Design Methodology Guide for the Vivado Design Suite(UG949)for more information.

You can also add the ILA debug core using a Tcl script from within an openVivadoproject as shown in the following code example:

create_ip -name ila -vendor xilinx.com -library ip -version 6.2 -module_name ila_0 set_property -dict [list CONFIG.C_PROBE6_WIDTH {32} CONFIG.C_PROBE3_WIDTH {64} \ CONFIG.C_NUM_OF_PROBES {7} CONFIG.C_EN_STRG_QUAL {1} CONFIG.C_INPUT_PIPE_STAGES {2} \ CONFIG.C_ADV_TRIGGER {true} CONFIG.ALL_PROBE_SAME_MU_CNT {4} CONFIG.C_PROBE6_MU_CNT {4} \ CONFIG.C_PROBE5_MU_CNT {4} CONFIG.C_PROBE4_MU_CNT {4} CONFIG.C_PROBE3_MU_CNT {4} \ CONFIG.C_PROBE2_MU_CNT {4} CONFIG.C_PROBE1_MU_CNT {4} CONFIG.C_PROBE0_MU_CNT {4}] [get_ips ila_0]

The following is an example of an ILA debug core instantiated into the RTL kernel source file of theRTL Kernel Debugexample design on GitHub. The ILA monitors the output of the combinatorial adder as specified in thesrc/hdl/krnl_vadd_rtl_int.svfile.

// ILA monitoring combinatorial adder ila_0 i_ila_0 ( .clk(ap_clk), // input wire clk .probe0(areset), // input wire [0:0] probe0 .probe1(rd_fifo_tvalid_n), // input wire [0:0] probe1 .probe2(rd_fifo_tready), // input wire [0:0] probe2 .probe3(rd_fifo_tdata), // input wire [63:0] probe3 .probe4(adder_tvalid), // input wire [0:0] probe4 .probe5(adder_tready_n), // input wire [0:0] probe5 .probe6(adder_tdata) // input wire [31:0] probe6 );

After the RTL kernel has been instrumented for debug with the appropriate debug cores, you can analyze the hardware in theVivadohardware manager features as described in the previous topic.

Debugging through the Host Application

To debug the host application working with the kernel code running on theSDAccelplatform, the application host code must be modified to ensure that you can set up the ILA trigger conditionsafterthe kernel has been programmed into the device, butbeforestarting the kernel.

Pausing a C++ Host Application

The following code example is from thesrc/host.cppcode from theRTL Kernelexample on GitHub:

.... std::string binaryFile = xcl::find_binary_file(device_name,"vadd"); cl::Program::Binaries bins = xcl::import_binary_file(binaryFile); devices.resize(1); cl::Program program(context, devices, bins); cl::Kernel krnl_vadd(program,"krnl_vadd_rtl"); wait_for_enter("\nPress ENTER to continue after setting up ILA trigger..."); //Allocate Buffer in Global Memory std::vector inBufVec, outBufVec; cl::Buffer buffer_r1(context,CL_MEM_USE_HOST_PTR | CL_MEM_READ_ONLY, vector_size_bytes, source_input1.data()); ... //Copy input data to device global memory q.enqueueMigrateMemObjects(inBufVec,0/* 0 means from host*/); //Set the Kernel Arguments ... //Launch the Kernel q.enqueueTask(krnl_vadd);

The addition of the conditionalif (interactive)test and the use of thewait_for_enterfunction pause the host application to give the ILA time to set up the required triggers and prepare to capture data from the kernel. After theVivadohardware manager is set up and configured properly, you can pressEnterto continue running the host application.

Pausing the Host Application Using GDB

Instead of making changes to the host application to pause before a kernel execution, you can run a GDB session from theSDxIDE. You can then set a breakpoint prior to the kernel execution in the host application. When the breakpoint is reached, you can set up the debug ILA triggers inVivadohardware manager, arm the trigger, and then resume the kernel execution in GDB.

Automated Setup for Hardware Debug

Note:A full SDxenvironment install is required to complete the following task. See the SDx Environments Release Notes, Installation, and Licensing Guidefor more information about installation.

Set up yourSDxenvironment by sourcing the appropriatesettings64.sh/.cshfile found in yourSDxinstall area.
Startxvc_pcieandhw_serverapps using thesdx_debug_hwscript.
```
sdx_debug_hw --xvc_pcie /dev/xvc_pub.m1025 --hw_server launching xvc_pcie... xvc_pcie -d /dev/xvc_pub.m1025 -s TCP::10200 launching hw_server... hw_server -sTCP::3121
```
Note:The /dev/xvc_*character device will differ depending on the platform. In this example, the character device is /dev/xvc_pub.m1025, though on your system it is likely to differ.
In theSDxIDE, modify the host code to include a pause statementafterthe kernel has been created/downloaded andbeforethe kernel execution is started, then recompile the host program.
- For C++ host code, add a pause after the creation of thecl::Kernelobject. The following snippet is from the Vector Add template design C++ host code:
- For C-language host code, add a pause after theclCreateKernel()function call:

Run your modified host program.

vadd_test.exe ./binary_container_1.xclbin Loading: './binary_container_1.xclbin' Pausing to allow you to arm ILA trigger. Hit enter here to resume host program...

LaunchVivado Design Suiteusing thesdx_debug_hwscript located in yourSDAccelinstallation directory.

> sdx_debug_hw --vivado --host xcoltlab40 --ltx_file ../workspace/vadd_test/System/pfm_top_wrapper.ltx

The command window displays the following:

launching vivado... ['vivado', '-source', 'sdx_hw_debug.tcl', '-tclargs', '/tmp/sdx_tmp/project_1/project_1.xpr', 'workspace/vadd_test/System/pfm_top_wrapper.ltx', 'xcoltlab40', '10200', '3121'] ****** Vivado v2018.2 (64-bit) **** SW Build 2245749 on Wed May 30 12:36:19 MDT 2018 **** IP Build 2245576 on Wed May 30 15:12:50 MDT 2018 ** Copyright 1986-2018 Xilinx, Inc. All Rights Reserved. start_gui

InVivado Design Suite, run the ILA trigger.

PressEnterto un-pause the host program.

vadd_test.exe ./binary_container_1.xclbin Loading: './binary_container_1.xclbin' Pausing to allow you to arm ILA trigger. Hit enter here to resume host program... TEST PASSED

In theVivado Design Suite, see the interface transactions on the kernel compute unit slave control interface in the Waveform view.

Manual Setup for Hardware Debug

Manually Starting Debug Servers

Note:The following steps are also applicable when using Nimbix and other cloud platforms.

There are two steps required to start the debug servers prior to debugging the design in Vivadohardware manager.

Source theSDxenvironment setup script,settings64.cshorsettings64.sh, and launch thexvc_pcieserver. The filename passed toxvc_pciemust match the character driver file installed with the kernel device driver.
```
>xvc_pcie -d /dev/xvc_pub.m1025
```
Note:The xvc_pcieserver has many useful command line options. You can issue xvc_pcie -helpto obtain the full list of available options.

Start the XVC server on port 10201 and thehw_serveron port 3121.

>hw_server "set auto-open-servers xilinx-xvc:localhost:10201" -e "set always-open-jtag 1"

Starting Debug Servers on an Amazon F1 Instance

Instructions to start the debug servers on an Amazon F1 instance can be found here:https://github.com/aws/aws-fpga/blob/master/hdk/docs/Virtual_JTAG_XVC.md

Debugging Designs using Vivado Hardware Manager

Traditionally, a physical JTAG connection is used to debug FPGAs. TheSDAccelplatforms have leveraged XVC for a debug flow that enables debug in the cloud. To take advantage of this capability,SDAccelenables running the XVC server. The XVC server is an implementation ofXilinxVirtual Cable (XVC) protocol, which allows theVivado Design Suiteto connect to a local or remote target FPGA for debug, using standardXilinxdebug cores like the Integrated Logic Analyzer IP (ILA), or the Virtual Input/Output IP (VIO), and others.

TheVivadohardware manager (Vivado Design SuiteorVivadoLab Edition) can be running on the target instance or it can be running remotely on a different host. The TCP port on which the XVC server is listening must be accessible to the host runningVivadohardware manager. To connect theVivadohardware manager to XVC server on the target, the following steps should be followed on the machine hosting theVivadotools:

Launch theVivadoLab Edition, or the fullVivado Design Suite.
SelectOpen Hardware Managerfrom the Tasks menu, as shown in the following figure.
Connect to theVivadotoolshw_server, specifying a local or remote connection, and theHost nameandPort, as shown below.
Connect to the target instance Virtual JTAG XVC server.
Select the debug bridge instance from the Hardware window of theVivadohardware manager.
In the Hardware Device Properties window select the appropriate probes file for your design by clicking the icon next to the Probes file entry, selecting the file, and clickingOK. This refreshes the hardware device, and it should now show the debug cores present in your design.
TIP:The probes file ( .ltx) is written out during the implementation of the kernel by the Vivadotool, if the kernel has debug cores as specified in Hardware Debugging Using ChipScope.
TheVivadohardware manager can now be used to debug the kernels running on theSDAccelplatform. Refer to theVivado Design Suite User Guide: Programming and Debugging(UG908)for more information on working with theVivadohardware manager.

Debugging aMicroBlazeProcessor (RTL Kernels Only)

Note:This technique requires familiarity with the Vivado Design Suite, RTL design, the MicroBlaze™processor, and standard MicroBlazedebugging techniques.

In RTL kernel block designs, aMicroBlazeprocessor is included under the control hierarchy. To debug the software applications running on theMicroBlazeprocessor, aMicroBlaze Debug Module (MDM)can optionally be included in the RTL kernel block design, allowing standardMicroBlazedebugging techniques to take place over XVC. To enableMicroBlazedebugging, both of the following must be true:

TheSDAccelenvironment platform must supportMicroBlazedebugging over XVC.
The RTL kernel must contain aMicroBlazeprocessor andMicroBlaze Debug Module (MDM).

The following platforms support hardware debug of aMicroBlazeprocessor:

xilinx_u200_xdma_201830_1
xilinx_u250_xdma_201830_1
xilinx_vcu1525_xdma_201830_1

MicroBlazedebugging can optionally be enabled in the RTL Kernel Wizard user interface. When generating the RTL kernel, if the platform supportsMicroBlazedebug, a checkbox appears in the wizard allowing the feature to be enabled. When this box is checked, the optionalMicroBlaze Debug Module (MDM)is included in the control block of the RTL kernel. The following steps detail how to enableMicroBlazedebug on your RTL kernel during the generation of the kernel.

Launch the RTL Kernel Wizard by clickingXilinx>RTL Kernel Wizard. When the RTL Kernel Wizard launches, clickNext.
On the General Settings page, selectBlock Designas the kernel type, and check the box toEnable MicroBlaze Debug, as seen in the following figure:

Connecting to aMicroBlazeProcessor in an RTL Kernel over XVC

When the RTL kernel has been generated with MicroBlazedebug and an .xclbinbinary has been created, you can connect to the MicroBlazeprocessor embedded in the kernel while it is running in hardware to view hardware registers and perform standard MicroBlazedebugging techniques.

Set up your environment by sourcing the appropriatesettings64.sh/.cshfile found in your install area.
Start thexvc_pcieandhw_serverapps using thesdx_debug_hwscript, as shown in the following example:
```
sdx_debug_hw --xvc_pcie /dev/xvc_pub.m1025 --hw_server launching xvc_pcie... xvc_pcie -d /dev/xvc_pub.m1025 -s TCP::10200 launching hw_server... hw_server -sTCP::3121
```
Note:The /dev/xvc_*character device differs depending on the platform. In this example, the character device is /dev/xvc_pub.m1025, though on your system it is likely to differ.

Launch the Xilinx Software Command Line Tool (XSCT):

$ xsct ****** Xilinx Software Commandline Tool (XSCT) v2018.3 **** SW Build 2373407 on Thu Oct 25 21:12:35 MDT 2018 ** Copyright 1986-2018 Xilinx, Inc. All Rights Reserved. xsct%

Connect to the hardware server and XVC server to list the available targets:
```
xsct% connect -url tcp:localhost:3121 -xvc-url tcp:localhost:10200 tcfchan#0 xsct% targets 1 debug_bridge 2 00000000 3 Legacy Debug Hub 4 MicroBlaze Debug Module at USER1.1.2.2 5 MicroBlaze #0 (Running) xsct%
```
Note:While this example uses a both a local hardware server and local XVC server, this is not a requirement. If you wish to use XSCT on a remote machine, replace localhostin the above example with the IP address or host name of the host on which sdx_debug_hwis running.
As can be seen, theMicroBlazeprocessor is listed as target number 5. It can be connected to by issuing thetargets -setcommand. Listing the targets again shows that theMicroBlazeprocessor has been selected as the active target:
```
xsct% targets -set 5 xsct% targets 1 debug_bridge 2 00000000 3 Legacy Debug Hub 4 MicroBlaze Debug Module at USER1.1.2.2 5* MicroBlaze #0 (Running)
```

At this point, standardMicroBlazedebugging techniques can be applied as described in theMicroBlaze Processor Reference Guide(UG984). For example, to list the contents of theMicroBlazeregisters,rrdcan be issued:

xsct% rrd r0: 0000000000000000 r1: 00000000000115e8 r2: 0000000000010960 r3: 0000000000000006 r4: 0000000000000006 r5: 0000000000000000 r6: 0000000000000000 r7: 0000000000000000 r8: 0000000000000000 r9: 0000000000000000 r10: 0000000000000000 r11: 0000000000000000 r12: 0000000000000000 r13: 0000000000010a60 r14: 0000000000000000 r15: 0000000000010348 r16: 0000000000000000 r17: 0000000000000000 r18: 00000000ffffffff r19: 00000000000115e8 r20: 0000000000000000 r21: 0000000000000000 r22: 0000000000000000 r23: 0000000000000000 r24: 0000000000000000 r25: 0000000000000000 r26: 0000000000000000 r27: 0000000000000000 r28: 0000000000000000 r29: 0000000000000000 r30: 0000000000000000 r31: 0000000000000000 pc: 00000000000106bc msr: 00000010 ear: 0000000000000010 esr: 00000010 btr: 0000000000000010 edr: 00000010 dcr: 00000009 dsr: 21010000 xsct%