SDAccel Development Environment Features

Compiler

  • RTL Kernel Support

    2017.1 provides significant improvement to usability and performance for importing and optimizing RTL kernels in SDx™.

    • New RTL Kernel wizard is added to provide user a template for importing RTL IP into SDx.
    • Provides scripts to build the RTL kernel into.xo(Xilinx Object) avoiding error-prone creation ofkernel.xmlfiles that were required in earlier releases.
    • For Vivado® experts working on optimizing performance of RTL kernels,xocchas been enhanced to include Vivado script through-custom_scriptduringxocc --link.
    • Utilization reports are automatically generated during compilation. These reports show utilization of LUTS, Registers, BRAMs, and DSPs for the platform region, as well as for each of the kernels.
  • The SDAccel™ compiler employs mathematical techniques to identify statically the access pattern of inputs and outputs to perform better memory coalescing and burst inferencing. This is an early access feature and can be enabled with both of the followingxoccoptions:
    • --xp param:compiler.version=39
    • --xp param:compiler.advancedLoopOptimizations=true
  • SDx/HLS should be able to infer a shift register pattern automatically for OpenCL™ kernel. OpenCL Shifter Design Pattern is inferred if:
    • The shift logic is described simply using for-loop.
    • The array is initiated with 0, with assignment to the begin or to the end for new value.
    • The access points should be constant offset.
  • The compiler automatically decides if a shift-register is implemented in SRL or BRAM to improve better resource use and improve timing closure.
  • Dataflow support for OpenCL is now a production feature.
  • The compiler can support dataflow on functions with arbitrary sized parameters, sub-functions, or loops. Dataflow can also apply to loop statements.
  • Thexocccommand to define OpenCL Compiler dataflow FIFO size is:
    --xp param:compiler.xclDataflowFifoDepth = 4
  • The following warning message might appear during compile:
    kernel.cl:28:17: warning: unknown attribute 'xcl_dataflow' ignored __attribute__ ((xcl_dataflow))

    Ignore the warning or use-k kernel_nameto avoid the warning.

  • SDx introduces the OpenCL 2.0 image data type, which provides the ability to read and write to images in kernels through OpenCL 2.0 image built-ins.
  • The supported APIs are:
    • clCreateImage()for the image types listed above.
    • clGetSupportedImageFormats()
    • clEnqueueReadImage()
    • clEnqueueWriteImage()
    • clGetImageInfo()
  • Enhancements to OpenCL math built-in functions to improve performance and reduce resources.
  • To provide better control to expert users, xocc allows users to write out the default script as well as apply custom scripts.
    xocc -c -export_script xocc -c -custom_script
  • Changes to SDAccel platform include the following:
    • XDMA enhanced to support two Physical Functions to provide one PF for secure management.
    • AXI Firewall at XDMA Full AXI4 and two AXI Lite interfaces to insulate the platform from hangs caused by AXI protocol violations in kernels.
    • Feature ROM to embed platform data in a ROM in the DSA to enable Run Time checks.
    • Bitstream download through ICAP on all platforms rather than MCAP for faster downloads.
  • Introducing support for Kintex® UltraScale™ FPGA KCU1500 Reconfigurable Acceleration PCIe® card.
Table 1.Device Support Archive (DSA)
Board Device Supported DSAs Kernel Clock Frequency MHz Status Features
XIL-ACCEL-RD- KU115 KU115 xilinx:xil-accel-rd-pcie3-ku115:4ddr:4.0 300 Production
  • PCIe Gen3x8, 4 DDR.

  • Kernel clock frequency control.

  • Automatic frequency scaling.

  • Second kernel clock at a higher frequency (up to 500MHz) is now supported that can be used for user created RTL kernels.

  • Increased fabric resources for compute units. Global memory changes to volatile, between binary loads.

KCU1500 KU115

xilinx:kcu1500:4ddr:4.0

300 Production
  • PCIe Gen3x8, 4 DDR.

  • Kernel clock frequency control.

  • Automatic frequency scaling.

  • Second kernel clock at a higher frequency (up to 500MHz) is now supported that can be used for user created RTL kernels.

  • Increased fabric resources for compute units. Global memory changes to volatile, between binary loads.

ADM-PCIE-KU3 KU60 xilinx:adm- pcie-ku3:2ddr-xpr:4.0 250 Production

PCIe Gen3x8, 2 DDR

Automatic frequency scaling

Increased fabric resources for compute units. Global memory changes to volatile, between binary loads

ADM-PCIE-7V3 V7690T xiilinx:adm- pcie-7v3:1ddr:3.0 200 Production

PCIe Gen3x8, 1DDR

Automatic frequency scaling

Table 2.2017.14.x Platform Driver Changes
DSA User PF Driver User Device Node Management PF Driver Management PF Node
xilinx:kcu1500:4ddr-xpr:4.0 xdma /dev/xdmaX_user /dev/xdmaX_c2h_0 /dev/xdmaX_c2h_1 /dev/xdmaX_h2c_0 /dev/xdmaX_h2c_1 xclmgmt /dev/xclmgmtX
xilinx:xil-accel-rd-ku115:4ddr-xpr:4.0 xdma /dev/xdmaX_user /dev/xdmaX_c2h_0 /dev/xdmaX_c2h_1 /dev/xdmaX_h2c_0 /dev/xdmaX_h2c_1 xclmgmt /dev/xclmgmtX
xilinx:xil-accel-rd-vu9p:4ddr-xpr:4.1 xdma /dev/xdmaX_user /dev/xdmaX_c2h_0 /dev/xdmaX_c2h_1 /dev/xdmaX_h2c_0 /dev/xdmaX_h2c_1 xclmgmt /dev/xclmgmtX
xilinx:adm-pcie-ku3:2ddr-xpr:4.0 xdma /dev/xdmaX_user /dev/xdmaX_c2h_0 /dev/xdmaX_c2h_1 /dev/xdmaX_h2c_0 /dev/xdmaX_h2c_1 xclmgmt /dev/xclmgmtX

2017.1 enhancements also include:

  • SDx Eclipse UI
    • Platform setup and project creation.
      • Add custom platform(s) without creating an SDx project.
      • SDx example store: access to GitHub SDSoC and SDAccel examples.
      • Wizard for creating RTL kernels.
    • Acceleration

      • Support for accelerating C function templates.
      • Launch HLS for specified hardware functions.
    • Reporting

      • New post-implementation utilization reports.
      • Collapsible headers on all SDx reports.
  • Xilinx Runtime.
    • Early accessxoclkernel driver, based on Linux kernel GEM framework for PCIe based DSAs has the following features.
      • Support for host page pinning which improves DMA bandwidth.
      • Uses Linux kernel based memory management for device memory management.
      • Is multi-threading safe and provides a single device node per device for all device operations.
      • xoclrequires Redhat 6.9 or higher version or Ubuntu 16.04.
      • User can installxocl, by invokingxbinst -gem . The rest of the steps (e.g. running ./install.sh) remain the same.
    • Features ROM in device which advertises device configuration to the driver.
    • Migration toxclbin2format, which has features like skipping bitstream re-download if the same bitstream is already running.
    • CallingclReleaseContext()truly releases exclusive lock on the device so that another concurrently running application can create context withclCreateContext().
    • Several new features inxbaskincluding the new commandsscan,mem, andstatus.
    • xbsakflash requires root permissions.
    • 4.X DSAs have AXI Firewal IP which protects PCIe from hangs and stalls inside the device. In case of AXI bus errors, AXI Firewall IP would trip which will cause the driver to send a SIGBUS to all applications which have opened the device node.
    • Runtime now includes support for latest version of OpenCL C++ wrappers from Khronos. Thecl2.hppheader file ships along with standard OpenCL C API header files.
  • xoccsupports set target kernel frequency. Overriding kernel clock frequency with lower frequency might help with designs that fail to meet timing on platform clocks.
    • --kernel_frequencysets a user-defined clock frequency in MHz for kernel, overriding a default value from DSA.
    • For a kernel compilation to change target to 150 MHz, add--kernel_frequency 150.
  • Usability
    • GDB extension to provide visibility into OpenCL data structurescl_queue,cl_event, andcl_memto debug host application hangs.
    • Application timeline trace refinements.
      • Multicolor support for better visualization of OpenCL API calls.
      • Support for additional OpenCL APIs:
        • clCreateContext
        • clCreateImage
        • clEnqueueTask
        • clEnqueueMigrateMemObjects
        • clEnqueueReadImage
        • clEnqueueWriteImage
        • clEnqueueMigrateMem
        • clEnqueueMapBuffer
        • clEnqueueUnmapMemObject
    • Detailed Kernel Trace
      • Support for RTL kernels.
      • Reporting loop pipeline activity in waveform.
  • Enhanced debug checks in HW Emulation Flow covering:
    • Kernel or System transactions hangs.
    • Uninitialized memory read by kernel.
    • Out of DDR Range access.
    • Out of Bounds array access.
    • Periodic aliveness status during long HW Emulation runs.