Building the System
Building the system requires building both the hardware (kernels) and the software (host code) side of the system. The Project Editor view, shown below, gives a top-level view of the build configuration. It provides general information about the active build configuration, including the project name, current platform, and selected system configuration (OS and runtime). It also displays several build options including the selected build target, and options for enabling host and kernel debugging. For more details on build targets seeBuild TargetswhileDebugging Applications and Kernelsgives details on using the debug options.
The bottom portion of the Editor view lists the current kernels used in the project. The kernels are listed under the binary container. In the above example, the kernelkrnl_vadd
has been added tobinary_container_1
. To add a binary container left-click theicon. You can rename the binary container by clicking the default name and entering a new name.
To add a kernel to the binary container, left-click theicon located in the Hardware Functions window. It displays a list of kernels defined in the project. Select the kernel from the Add Hardware Functions dialog box as shown in the following figure.
In theCompute Unitscolumn, next to the kernel, enter a value to instantiate multiple instances of the kernel (called compute units) as described inCreating Multiple Instances of a Kernel.
With the various options of the active build configuration specified, you can start the build process by clicking on the Build () command.
TheSDAccel™build process generates the host application executable (.exe) and the FPGA binary (.xclbin). TheSDAccelenvironment manages two separate independent build flows:
- Host code (software) build
- Kernel code (hardware) build
SDAcceluses a standard compilation and linking process for both these software and hardware elements of the project. The steps to build both the host and kernel code to generate the selected build target are described in the following sections.
Building the Host Application
The host application, written in C/C++ usingOpenCL™API calls, is built using theXilinx®C++ compiler (xcpp
) which is based on GNU compiler collection (GCC). Each source file is compiled to an object file (.o) and linked with theXilinxSDAccelruntime shared library to create the executable (.exe) which executes on the host CPU.
xcpp
is based on GCC, and therefore supports many standard GCC options which are not documented here. For information refer to the
GCC Option Index.
Compiling the Host Application
Each host application source file is compiled using the-c
option and generates an object file (.o).
xcpp ... -c ...
-o
option.
xcpp ... -o
You can produce debugging information using the
-g
option.
xcpp ... -g
Linking the Host Application
-l
option.
xcpp ... -l ...
-c
and
-l
options are not required, only the source input files are needed.
In the GUI flow, the host code and the kernel code are compiled and linked by clicking the Build () command.
Building the Hardware
The kernel code is written in C, C++,OpenCLC, or RTL and is built by thexocc
compiler; a command line utility modeled after GCC. The final output ofxocc
is the generation of the FPGA binary (.xclbin
) which links the kernel.xofiles and the hardware platform (.dsa). Generation of the.xclbin
is a two step build process requiring kernel compilation and linking.
Thexocc
can be used standalone (or ideally in scripts or a build system likemake
), and also is fully supported by theSDx™IDE.
Build Target
The compilation is dependent on the selected build target, which is discussed in greater detail inBuild Targets. You can specify the build target using thexocc –target
option as shown below.
xocc --target sw_emu|hw_emu|hw ...
- For software emulation (
sw_emu
), the kernel source code is used during emulation. - For hardware emulation (
hw_emu
), the synthesized RTL code is used for simulation in the hardware emulation flow. - For system build (
hw
),xocc
generates the FPGA binary and the system can be run on hardware.
Compiling the Kernels
During compilation,xocc
compiles kernel accelerator functions (written in C/C++ orOpenCLlanguage) intoXilinxobject (.xo) files. Each kernel is compiled into separate.xofiles. This is the-c/--compile
mode ofxocc
.
Kernels written in RTL are compiled using thepackage_xo
command line utility. This utility, similar toxocc -c
, also generates.xofiles which are subsequently used in the linking stage. SeeRTL Kernelsfor more information.
Linking the Kernels
As discussed above, the kernel compilation process results in aXilinxobject file (.xo) whether the kernel is described inOpenCLC, C, C++, or RTL. During the linking stage,.xofiles from different kernels are linked with the shell to create the FPGA binary container file (.xclbin) which is needed by the host code.
xocc
command to link files is:
$ xocc -l .xo -o .xclbin
where one more inputkernel_object_file
are given and thebinary_platform_file
is the name of thexclbinoutput file.
Creating Multiple Instances of a Kernel
During the linking stage, you can specify the number of instances of a kernel, referred to as a compute unit, through the--nk xocc
switch. This allows the same kernel function to run in parallel at application runtime to improve the performance of the host application, using different device resources on the FPGA.
--nk
options, see
SDAccel Environment Programmers Guide(UG1277)and
SDx Command and Utility Reference Guide(UG1279).
xocc --nk
option specifies the number of instances of a given kernel to instantiate into the
.xclbinfile. The syntax of the command is as follows:
$ xocc –nk ::.…
foo
is instantiated three times with compute unit names
fooA
,
fooB
, and
fooC
:
$ xocc --nk foo:3:fooA.fooB.fooC
--sp
.
In the GUI flow, the number of compute units can be specified by right-clicking the top-level kernel within theAssistantview, and selectingSettings.
From within the Project Settings dialog box, select the desired kernel to instantiate and update the Compute units value. In the following figure, the kernel,krnl_vadd
, will be instantiated three times (that is, three CUs).
In the figure above, three compute units of thekrnl_vadd
kernel will be linked into the FPGA binary (.xclbin), addressable askrnl_vadd_1
,krnl_vadd_2
, andkrnl_vadd_3
.
To access the various instances of the kernel, use theOpenCLAPIclCreateSubDevices
in the host code to divide the device into multiple sub-devices containing one kernel instance per sub-device. For specific details, see "Sub-devices" section inSDAccel Environment Programmers Guide(UG1277).
Mapping Kernel Interfaces to Memory Resources
The link phase is when the memory ports of the kernels are connected to memory resources which include PLRAM and DDR. If not specified, connections to these resources will be completed automatically duringxocc
linking. However,Xilinxrecommends specifying these connections for optimal performance. For additional information, seeSDAccel Environment Programmers Guide(UG1277)andSDx Command and Utility Reference Guide(UG1279).
SDAccelplatforms can have access to various memory resources. By mapping the input and output ports from the compute unit to different memory resources for instance, you can improve overall performance by enabling simultaneous access to input and output data.
Use thexocc --sp
option during linking to map the interface from a compute unit to a memory resource.
Details of coding the host application can be found in the "Memory Data Transfer to/from the FPGA Device" section in theSDAccel Environment Programmers Guide.
The directive to assign a compute unit's memory interface to a memory resource is:
--sp .:
Where
compute_unit
is the name of the compute unit (CU)mem_interface
is the name of one of the compute unit's memory interface or function argumentmemory
is the memory resource
It is necessary to have a separate directive for each memory interface connection.
kernelinfo
if you have the
.xo
file or the
platforminfo
if you have the
.xclbin
file. For more information on the tool, see the
SDx Command and Utility Reference Guide(UG1279).
m_axi_gmem
from a CU named
vadd_1
to DDR[3] memory:
xocc … --sp vadd_1.m_axi_gmem:DDR[3]
In theSDxGUI, the--sp
switch can be added through theSDxGUI similar to the process outlined inCreating Multiple Instances of a Kernel. Right-click the top-level kernel in theAssistantview, and selectSettings. From within the Project Settings dialog box, enter the--sp
option in theXOCC Linker Optionsfield.
To add directives to thexocc
compilation through the GUI, from within theAssistant, right-click the desired kernel underSystemand select Settings.
This displays the hardware function settings dialog window where you can change the memory interface mapping under theCompute Unit Settingsarea. To change the memory resource mapping of a CU for a particular argument, click theMemorysetting of the respective argument and change to the desired memory resource. The following figure shows theaargument being selected.
To select the identical memory resource for all CU arguments, click the memory resource for the CU (that is,kernl_vadd_1
in the example above) and select the desired memory resource.
--sp
option to assign kernel interfaces to memory banks, you must specify the
--sp
option for all interfaces of the kernel. Refer to "Customization of DDR Bank to Kernel Connection" in the
SDAccel Environment Programmers Guidefor more information.
Kernel to Kernel Streaming Connection
Kernel to kernel (K2K) streaming provides direct streams between kernels. It is necessary to specify the stream connections between source and destination kernel stream interfaces. This is done duringxocc
linking through the–sc
option as shown below:
xocc -l --sc .
For example, to connect the two streaming ports for the following two kernels:
- Instance name CU_A with an output streaming port called
data_out
. - Instance name CU_B with an input streaming port called
data_in
.
xocc -l --sc CU_A.data_out:CU_B.data_in
Allocating Compute Units to SLRs
A Compute Unit (CU) is allocated to a super logic region (SLR) duringxocc
linking using the--slr
directive. The syntax of the command line directive is:
--slr :
wherecompute_unit
is the name of the CU andSLR_NUM
is the SLR number to which the CU is assigned.
For example,xocc … --slr vadd_1:SLR2
assigns the CU namedvadd_1
to SLR2.
The--slr
directive must be applied separately for each CU in the design. For instance, in the following example, three invocations of the--slr
directive are used to assign all three CUs to SLRs;krnl_vadd_1
andkrnl_vadd_2
are assigned to SLR1 whilekrnl_vadd_3
is assigned to SLR2.
--slr krnl_vadd_1:SLR1 --slr krnl_vadd_2:SLR1 --slr krnl_vadd_3:SLR2
In the absence of an--slr
directive for a CU, the tools are free to place the CU in any SLR. SeeKernel SLR and DDR Memory Assignmentsfor CU SLR mapping recommendations.
In theSDxGUI, to allocate a CU to an SLR in the GUI flow, right-click the desired kernel underSystemorEmulation-HWconfigurations and selectSettingsas shown in the following figure.
This displays the hardware function settings dialog window. Under the Compute Unit Settings area, you can change the SLR where the CU is allocated to by clicking theSLRsetting of the respective CU and selecting the desired SLR from the menu as shown. SelectingAutoallows the tools the freedom to place the CU in any SLR.
Controlling Implementation Results
When compiling or linking, fine grain control over the hardware generated bySDAccelfor hardware emulation and system builds can be specified using the--xp
switch.
The--xp
switch is paired with parameters to configure theVivado® Design Suite. For instance, the--xp
switch can configure the optimization, placement and timing results of the hardware implementation.
The--xp
can also be used to set up emulation and compile options. Specific examples of these parameters include setting the clock margin, specifying the depth of FIFOs used in the kernel dataflow region, and specifying the number of outstanding writes and reads to buffer on the kernel AXI interface. A full list of parameters and valid values can be found in theSDx Command and Utility Reference Guide.
param:=
, where:
param
: Required keyword.param_name
: Name of a parameter to apply.value
: Appropriate value for the parameter.
xocc
linker does not check the validity of the parameter or value. Be careful to apply valid values or the downstream tools might not work properly.
For example:
$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true -–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"
You must repeat the--xp
switch for eachparam
used in thexocc
command as shown below:
$ xocc -–xp param:compiler.enableDSAIntegrityCheck=true -–xp param:prop:kernel.foo.kernel_flags="-std=c++0x"
You can specifyparam
values in anxocc.inifile with each option specified on a separate line (without the--xp
switch).
--xp
settings. Locate the file in the same directory as the build configuration.
param:compiler.enableDSAIntegrityCheck=true param:prop:kernel.foo.kernel_flags="-std=c++0x"
Under the GUI flow, if noxocc.iniis present, the application uses the GUI build settings. Under aMakefile
flow, if noxocc.inifile is present, it will use the configurations within the Makefile.
In theSDxGUI, the--xp
switch can be added through the GUI similar to that outlined inCreating Multiple Instances of a Kernel. Right-click the top-level kernel in theAssistantview, and selectSettings. From within the Project Settings dialog box, enter the--xp
option in theXOCC Linker Optionsfield.
You can also addxocc
compiler options and--xp
parameters to kernels by right-clicking the kernel in the Assistant view. The following image demonstrates the--xp
setting for thekrnl_vadd
kernel.
Controlling Report Generation
Thexocc
-R
switch controls the level of report generation during the link stage for hardware emulation and system targets. Builds that generate fewer reports will typically run more quickly.
The command line option is as follows:
$ xocc -R
Where
is one of the followingreport_level
options:
-R0
: Minimal reports and no intermediate design checkpoints (DCP)-R1
: Includes R0 reports plus:- Identifies design characteristics to review for each kernel (
report_failfast
) - Identifies design characteristics to review for full design post-opt (
report_failfast
) - Saves post-opt DCP
- Identifies design characteristics to review for each kernel (
-R2
: Includes R1 reports plus:- TheVivadodefault reporting including DCP after each implementation step
- Design characteristics to review for each SLR after placement (
report_failfast
)
report_failfast
is a utility that highlights potential device utilization challenges, clock constraint problems, and potential unreachable target frequency (MHz).
The-R
switch can also be added through theSDxGUI as described inCreating Multiple Instances of a Kernel:
- Right-click the top-level kernel in theAssistantview and selectSettings.
- From within the Project Settings dialog box, enter the
-R
option in theXOCC Linker Optionsfield.
Build Targets
TheSDAccelbuild target defines the nature of FPGA binary generated by the build process. There are three different build targets, two emulation targets (software and hardware emulation) used for debug and validation purposes and the default hardware target used to generate the actual FPGA binary.
Software Emulation
The main goal of software emulation is to ensure functional correctness and to partition the application into kernels. For software emulation, both the host code and the kernel code are compiled to run on the host x86 processor. The programmer model of iterative algorithm refinement through fast compile and run loops is preserved. Software emulation has compile and execution times that are the same as a CPU. Refer to theSDAccel Environment Debugging Guidefor more information on running software emulation.
In the context of theSDAcceldevelopment environment, software emulation on a CPU is the same as the iterative development process that is typical of CPU/GPU programming. In this type of development style, a programmer continuously compiles and runs an application as it is being developed.
For RTL kernels, software emulation can be supported if a C model is associated with the kernel. The RTL kernel wizard packaging step provides an option to associate C model files with the RTL kernel for support of software emulation flows.
Hardware Emulation
While the software emulation flow is a good measure of functional correctness, it does not guarantee correctness on the FPGA execution target. The hardware emulation flow enables the programmer to check the correctness of the logic generated for the custom compute units before deployment on hardware, where a compute unit is an instantiation of a kernel.
TheSDAccelenvironment generates at least one custom compute unit for each kernel in an application. Each kernel is compiled to a hardware model (RTL). During emulation kernels are executed with a hardware simulator, but the rest of the system still uses a C simulator. This allows theSDAccelenvironment to test the functionality of the logic that will be executed on the FPGA compute fabric.
In addition, hardware emulation provides performance and resource estimation, allowing the programmer to get an insight into the design.
In hardware emulation, compile and execution times are longer in software emulation; thusXilinxrecommends that you use small data sets for debug and validation.
System
When the build target is system,xocc
generates the FPGA binary for the device by running synthesis and implementation on the design. The binary includes custom logic for every compute unit in the binary container. Therefore, it is normal for this build step to run for a longer period of time than the other steps in theSDAccelbuild flow. However, because the kernels will be running on actual hardware, their execution times will be extremely fast.
The generation of custom compute units uses theVivadoHigh-Level Synthesis (HLS) tool, which is the compute unit generator in the application compilation flow. Automatic optimization of a compute unit for maximum performance is not possible for all coding styles without additional user input to the compiler. TheSDAccel Environment Profiling and Optimization Guidediscusses the additional user input that can be provided to theSDAccelenvironment to optimize the implementation of kernel operations into a custom compute unit.
After all compute units have been generated, these units are connected to the infrastructure elements provided by the target device in the solution. The infrastructure elements in a device are all of the memory, control, and I/O data planes which the device developer has defined to support anOpenCLapplication. TheSDAccelenvironment combines the custom compute units and the base device infrastructure to generate an FPGA binary which is used to program theXilinxdevice during application execution.
Specifying a Target
You can specify the target build from the command-line with the following command:
xocc --target sw_emu|hw_emu|hw ...
Similarly, from within the GUI, the build target can be specified by selecting theActive build configurationpull-down tab in the Project Editor window. This provides three choices (see the following figure):
- Emulation-SW
- Emulation-HW
- System
After setting the active build configuration, build the system from the
menu command.The recommended build flow is detailed inDebugging Flows.