Building the System

Building the system requires building both the hardware (kernels) and the software (host code) side of the system. The Project Editor view, shown below, gives a top-level view of the build configuration. It provides general information about the active build configuration, including the project name, current platform, and selected system configuration (OS and runtime). It also displays several build options including the selected build target, and options for enabling host and kernel debugging. For more details on build targets seeBuild TargetswhileDebugging Applications and Kernelsgives details on using the debug options.

Figure:Project Editor View

The bottom portion of the Editor view lists the current kernels used in the project. The kernels are listed under the binary container. In the above example, the kernelkrnl_vaddhas been added tobinary_container_1. To add a binary container left-click theicon. You can rename the binary container by clicking the default name and entering a new name.

To add a kernel to the binary container, left-click theicon located in the Hardware Functions window. It displays a list of kernels defined in the project. Select the kernel from the Add Hardware Functions dialog box as shown in the following figure.

Figure:Adding Hardware Functions to a Binary Container

In theCompute Unitscolumn, next to the kernel, enter a value to instantiate multiple instances of the kernel (called compute units) as described inCreating Multiple Instances of a Kernel.

With the various options of the active build configuration specified, you can start the build process by clicking on the Build () command.

TheSDAccel™build process generates the host application executable (.exe) and the FPGA binary (.xclbin). TheSDAccelenvironment manages two separate independent build flows:

  • Host code (software) build
  • Kernel code (hardware) build

SDAcceluses a standard compilation and linking process for both these software and hardware elements of the project. The steps to build both the host and kernel code to generate the selected build target are described in the following sections.

Building the Host Application

The host application, written in C/C++ usingOpenCL™API calls, is built using theXilinx®C++ compiler (xcpp) which is based on GNU compiler collection (GCC). Each source file is compiled to an object file (.o) and linked with theXilinxSDAccelruntime shared library to create the executable (.exe) which executes on the host CPU.

TIP: xcppis based on GCC, and therefore supports many standard GCC options which are not documented here. For information refer to the GCC Option Index.

Compiling the Host Application

Each host application source file is compiled using the-coption and generates an object file (.o).

xcpp ... -c  ... 
The name of the output object file can optionally be specified with the -ooption.
xcpp ... -o 
You can produce debugging information using the -goption.
xcpp ... -g

Linking the Host Application

The generated object files ( .o) are linked with the Xilinx SDAccelruntime shared library to create the executable ( .exe). Linking is performed using the -loption.
xcpp ... -l  ... 
Note:Host compilation and linking can be integrated into one step. The -cand -loptions are not required, only the source input files are needed.

In the GUI flow, the host code and the kernel code are compiled and linked by clicking the Build () command.

Building the Hardware

The kernel code is written in C, C++,OpenCLC, or RTL and is built by thexocccompiler; a command line utility modeled after GCC. The final output ofxoccis the generation of the FPGA binary (.xclbin) which links the kernel.xofiles and the hardware platform (.dsa). Generation of the.xclbinis a two step build process requiring kernel compilation and linking.

Thexocccan be used standalone (or ideally in scripts or a build system likemake), and also is fully supported by theSDx™IDE.

Build Target

The compilation is dependent on the selected build target, which is discussed in greater detail inBuild Targets. You can specify the build target using thexocc –targetoption as shown below.

xocc --target sw_emu|hw_emu|hw ...
  • For software emulation (sw_emu), the kernel source code is used during emulation.
  • For hardware emulation (hw_emu), the synthesized RTL code is used for simulation in the hardware emulation flow.
  • For system build (hw),xoccgenerates the FPGA binary and the system can be run on hardware.

Compiling the Kernels

During compilation,xocccompiles kernel accelerator functions (written in C/C++ orOpenCLlanguage) intoXilinxobject (.xo) files. Each kernel is compiled into separate.xofiles. This is the-c/--compilemode ofxocc.

Kernels written in RTL are compiled using thepackage_xocommand line utility. This utility, similar toxocc -c, also generates.xofiles which are subsequently used in the linking stage. SeeRTL Kernelsfor more information.

Linking the Kernels

As discussed above, the kernel compilation process results in aXilinxobject file (.xo) whether the kernel is described inOpenCLC, C, C++, or RTL. During the linking stage,.xofiles from different kernels are linked with the shell to create the FPGA binary container file (.xclbin) which is needed by the host code.

The xocccommand to link files is:
$ xocc -l .xo -o .xclbin

where one more inputkernel_object_fileare given and thebinary_platform_fileis the name of thexclbinoutput file.

Creating Multiple Instances of a Kernel

During the linking stage, you can specify the number of instances of a kernel, referred to as a compute unit, through the--nk xoccswitch. This allows the same kernel function to run in parallel at application runtime to improve the performance of the host application, using different device resources on the FPGA.

Note:For additional information on the --nkoptions, see SDAccel Environment Programmers Guide(UG1277)and SDx Command and Utility Reference Guide(UG1279).
In the command-line flow, the xocc --nkoption specifies the number of instances of a given kernel to instantiate into the .xclbinfile. The syntax of the command is as follows:
$ xocc –nk ::.
For example, the kernel foois instantiated three times with compute unit names fooA, fooB, and fooC:
$ xocc --nk foo:3:fooA.fooB.fooC
TIP:While the kernel instance name is optional, it is highly recommended to specify one as it is required for options like --sp.

In the GUI flow, the number of compute units can be specified by right-clicking the top-level kernel within theAssistantview, and selectingSettings.

From within the Project Settings dialog box, select the desired kernel to instantiate and update the Compute units value. In the following figure, the kernel,krnl_vadd, will be instantiated three times (that is, three CUs).

Figure:Instantiate Multiple Compute Units

In the figure above, three compute units of thekrnl_vaddkernel will be linked into the FPGA binary (.xclbin), addressable askrnl_vadd_1,krnl_vadd_2, andkrnl_vadd_3.

To access the various instances of the kernel, use theOpenCLAPIclCreateSubDevicesin the host code to divide the device into multiple sub-devices containing one kernel instance per sub-device. For specific details, see "Sub-devices" section inSDAccel Environment Programmers Guide(UG1277).

Mapping Kernel Interfaces to Memory Resources

The link phase is when the memory ports of the kernels are connected to memory resources which include PLRAM and DDR. If not specified, connections to these resources will be completed automatically duringxocclinking. However,Xilinxrecommends specifying these connections for optimal performance. For additional information, seeSDAccel Environment Programmers Guide(UG1277)andSDx Command and Utility Reference Guide(UG1279).

SDAccelplatforms can have access to various memory resources. By mapping the input and output ports from the compute unit to different memory resources for instance, you can improve overall performance by enabling simultaneous access to input and output data.

Use thexocc --spoption during linking to map the interface from a compute unit to a memory resource.

Details of coding the host application can be found in the "Memory Data Transfer to/from the FPGA Device" section in theSDAccel Environment Programmers Guide.

The directive to assign a compute unit's memory interface to a memory resource is:

--sp .:

Where

  • compute_unitis the name of the compute unit (CU)
  • mem_interfaceis the name of one of the compute unit's memory interface or function argument
  • memoryis the memory resource

It is necessary to have a separate directive for each memory interface connection.

TIP:To obtain kernel information including kernel, port, and argument names use the command line tool kernelinfoif you have the .xofile or the platforminfoif you have the .xclbinfile. For more information on the tool, see the SDx Command and Utility Reference Guide(UG1279).
The following example assigns the memory interface called m_axi_gmemfrom a CU named vadd_1to DDR[3] memory:
xocc … --sp vadd_1.m_axi_gmem:DDR[3]

In theSDxGUI, the--spswitch can be added through theSDxGUI similar to the process outlined inCreating Multiple Instances of a Kernel. Right-click the top-level kernel in theAssistantview, and selectSettings. From within the Project Settings dialog box, enter the--spoption in theXOCC Linker Optionsfield.

To add directives to thexocccompilation through the GUI, from within theAssistant, right-click the desired kernel underSystemand select Settings.

This displays the hardware function settings dialog window where you can change the memory interface mapping under theCompute Unit Settingsarea. To change the memory resource mapping of a CU for a particular argument, click theMemorysetting of the respective argument and change to the desired memory resource. The following figure shows theaargument being selected.

Figure:Compute Unit Memory Setting

To select the identical memory resource for all CU arguments, click the memory resource for the CU (that is,kernl_vadd_1in the example above) and select the desired memory resource.

IMPORTANT:When using the --spoption to assign kernel interfaces to memory banks, you must specify the --spoption for all interfaces of the kernel. Refer to "Customization of DDR Bank to Kernel Connection" in the SDAccel Environment Programmers Guidefor more information.

Kernel to Kernel Streaming Connection

Kernel to kernel (K2K) streaming provides direct streams between kernels. It is necessary to specify the stream connections between source and destination kernel stream interfaces. This is done duringxocclinking through the–scoption as shown below:

xocc -l --sc .:

For example, to connect the two streaming ports for the following two kernels:

  1. Instance name CU_A with an output streaming port calleddata_out.
  2. Instance name CU_B with an input streaming port calleddata_in.
Use the following:
xocc -l --sc CU_A.data_out:CU_B.data_in

Allocating Compute Units to SLRs

A Compute Unit (CU) is allocated to a super logic region (SLR) duringxocclinking using the--slrdirective. The syntax of the command line directive is:

--slr :

wherecompute_unitis the name of the CU andSLR_NUMis the SLR number to which the CU is assigned.

For example,xocc … --slr vadd_1:SLR2assigns the CU namedvadd_1to SLR2.

The--slrdirective must be applied separately for each CU in the design. For instance, in the following example, three invocations of the--slrdirective are used to assign all three CUs to SLRs;krnl_vadd_1andkrnl_vadd_2are assigned to SLR1 whilekrnl_vadd_3is assigned to SLR2.

--slr krnl_vadd_1:SLR1 --slr krnl_vadd_2:SLR1 --slr krnl_vadd_3:SLR2

In the absence of an--slrdirective for a CU, the tools are free to place the CU in any SLR. SeeKernel SLR and DDR Memory Assignmentsfor CU SLR mapping recommendations.

In theSDxGUI, to allocate a CU to an SLR in the GUI flow, right-click the desired kernel underSystemorEmulation-HWconfigurations and selectSettingsas shown in the following figure.

Figure:xocc Link Settings

This displays the hardware function settings dialog window. Under the Compute Unit Settings area, you can change the SLR where the CU is allocated to by clicking theSLRsetting of the respective CU and selecting the desired SLR from the menu as shown. SelectingAutoallows the tools the freedom to place the CU in any SLR.

Figure:Compute Unit SLR Setting

Controlling Implementation Results

When compiling or linking, fine grain control over the hardware generated bySDAccelfor hardware emulation and system builds can be specified using the--xpswitch.

The--xpswitch is paired with parameters to configure theVivado® Design Suite. For instance, the--xpswitch can configure the optimization, placement and timing results of the hardware implementation.

The--xpcan also be used to set up emulation and compile options. Specific examples of these parameters include setting the clock margin, specifying the depth of FIFOs used in the kernel dataflow region, and specifying the number of outstanding writes and reads to buffer on the kernel AXI interface. A full list of parameters and valid values can be found in theSDx Command and Utility Reference Guide.

TIP:Familiarity with the Vivado Design Suite User Guide: High-Level Synthesis(UG902)and the tool suite is necessary to make the most use of these parameters. See the Vivado Design Suite User Guide: Implementation(UG904)for more information.
In the command line flow, parameters are specified as param:=, where:
  • param: Required keyword.
  • param_name: Name of a parameter to apply.
  • value: Appropriate value for the parameter.
IMPORTANT:The xocclinker does not check the validity of the parameter or value. Be careful to apply valid values or the downstream tools might not work properly.

For example:

$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true -–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"

You must repeat the--xpswitch for eachparamused in thexocccommand as shown below:

$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true -–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"

You can specifyparamvalues in anxocc.inifile with each option specified on a separate line (without the--xpswitch).

An xocc.iniis an initialization file that contains --xpsettings. Locate the file in the same directory as the build configuration.
param:compiler.enableDSAIntegrityCheck=true param:prop:kernel.foo.kernel_flags="-std=c++0x"

Under the GUI flow, if noxocc.iniis present, the application uses the GUI build settings. Under aMakefileflow, if noxocc.inifile is present, it will use the configurations within the Makefile.

In theSDxGUI, the--xpswitch can be added through the GUI similar to that outlined inCreating Multiple Instances of a Kernel. Right-click the top-level kernel in theAssistantview, and selectSettings. From within the Project Settings dialog box, enter the--xpoption in theXOCC Linker Optionsfield.

You can also addxocccompiler options and--xpparameters to kernels by right-clicking the kernel in the Assistant view. The following image demonstrates the--xpsetting for thekrnl_vaddkernel.

Figure:Assistant XOCC Compile Settings

Controlling Report Generation

Thexocc-Rswitch controls the level of report generation during the link stage for hardware emulation and system targets. Builds that generate fewer reports will typically run more quickly.

The command line option is as follows:

$ xocc -R 

Whereis one of the followingreport_leveloptions:

  • -R0: Minimal reports and no intermediate design checkpoints (DCP)
  • -R1: Includes R0 reports plus:
    • Identifies design characteristics to review for each kernel (report_failfast)
    • Identifies design characteristics to review for full design post-opt (report_failfast)
    • Saves post-opt DCP
  • -R2: Includes R1 reports plus:
    • TheVivadodefault reporting including DCP after each implementation step
    • Design characteristics to review for each SLR after placement (report_failfast)
TIP:The report_failfastis a utility that highlights potential device utilization challenges, clock constraint problems, and potential unreachable target frequency (MHz).

The-Rswitch can also be added through theSDxGUI as described inCreating Multiple Instances of a Kernel:

  • Right-click the top-level kernel in theAssistantview and selectSettings.
  • From within the Project Settings dialog box, enter the-Roption in theXOCC Linker Optionsfield.

Build Targets

TheSDAccelbuild target defines the nature of FPGA binary generated by the build process. There are three different build targets, two emulation targets (software and hardware emulation) used for debug and validation purposes and the default hardware target used to generate the actual FPGA binary.

Software Emulation

The main goal of software emulation is to ensure functional correctness and to partition the application into kernels. For software emulation, both the host code and the kernel code are compiled to run on the host x86 processor. The programmer model of iterative algorithm refinement through fast compile and run loops is preserved. Software emulation has compile and execution times that are the same as a CPU. Refer to theSDAccel Environment Debugging Guidefor more information on running software emulation.

In the context of theSDAcceldevelopment environment, software emulation on a CPU is the same as the iterative development process that is typical of CPU/GPU programming. In this type of development style, a programmer continuously compiles and runs an application as it is being developed.

For RTL kernels, software emulation can be supported if a C model is associated with the kernel. The RTL kernel wizard packaging step provides an option to associate C model files with the RTL kernel for support of software emulation flows.

Hardware Emulation

While the software emulation flow is a good measure of functional correctness, it does not guarantee correctness on the FPGA execution target. The hardware emulation flow enables the programmer to check the correctness of the logic generated for the custom compute units before deployment on hardware, where a compute unit is an instantiation of a kernel.

TheSDAccelenvironment generates at least one custom compute unit for each kernel in an application. Each kernel is compiled to a hardware model (RTL). During emulation kernels are executed with a hardware simulator, but the rest of the system still uses a C simulator. This allows theSDAccelenvironment to test the functionality of the logic that will be executed on the FPGA compute fabric.

In addition, hardware emulation provides performance and resource estimation, allowing the programmer to get an insight into the design.

In hardware emulation, compile and execution times are longer in software emulation; thusXilinxrecommends that you use small data sets for debug and validation.

IMPORTANT:The DDR memory model and the memory interface generator (MIG) model used in Hardware Emulation are high-level simulation models. These models are good for simulation performance, however they approximate latency values and are not cycle-accurate like the kernels. Consequently, any performance numbers shown in the profile summary report are approximate, and must be used only as a general guidance and for comparing relative performance between different kernel implementations.

System

When the build target is system,xoccgenerates the FPGA binary for the device by running synthesis and implementation on the design. The binary includes custom logic for every compute unit in the binary container. Therefore, it is normal for this build step to run for a longer period of time than the other steps in theSDAccelbuild flow. However, because the kernels will be running on actual hardware, their execution times will be extremely fast.

The generation of custom compute units uses theVivadoHigh-Level Synthesis (HLS) tool, which is the compute unit generator in the application compilation flow. Automatic optimization of a compute unit for maximum performance is not possible for all coding styles without additional user input to the compiler. TheSDAccel Environment Profiling and Optimization Guidediscusses the additional user input that can be provided to theSDAccelenvironment to optimize the implementation of kernel operations into a custom compute unit.

After all compute units have been generated, these units are connected to the infrastructure elements provided by the target device in the solution. The infrastructure elements in a device are all of the memory, control, and I/O data planes which the device developer has defined to support anOpenCLapplication. TheSDAccelenvironment combines the custom compute units and the base device infrastructure to generate an FPGA binary which is used to program theXilinxdevice during application execution.

IMPORTANT:The SDAccelenvironment always generates a valid FPGA hardware design and performs default connections from the kernel to global memory. Xilinxrecommends explicitly defining optimal connections. See Kernel SLR and DDR Memory Assignmentsfor details.

Specifying a Target

You can specify the target build from the command-line with the following command:

xocc --target sw_emu|hw_emu|hw ...

Similarly, from within the GUI, the build target can be specified by selecting theActive build configurationpull-down tab in the Project Editor window. This provides three choices (see the following figure):

  • Emulation-SW
  • Emulation-HW
  • System

Figure:Active Build Configuration

TIP:You can also assign the compilation target from the Build ( ) command, or from the Project>Build Configurations>Set Activemenu command.

After setting the active build configuration, build the system from theProject>Build Projectmenu command.

The recommended build flow is detailed inDebugging Flows.