Performance Measurement Using the AXI Performance Monitor

The AXI Performance Monitor (APM) module is used to monitor basic information about data transfers between the PSArm®cores and the hardware in the PL. It captures statistics such as number of read/write transactions, throughput, and latency for the AXI transactions on the busses in the system.

This section demonstrates how to insert an APM core into the system, monitor the instrumented system, and view the performance data produced.

Create a Standalone Project and Implementing APM

  1. Open theSDSoC™environment, and create a newSDSoCproject/workspace using any platform or operating system selection. Select theMatrix Multiplication and Additiontemplate.
  2. In Application Project Settings, selectInsert AXI Performance Monitor.

    Enabling this option and building the project adds the APM IP core to your hardware system. The APM IP uses a small amount of resources in the programmable logic. TheSDSoCenvironment connects the APM to the hardware/software interface ports, which are the general purpose (GP) and high performance (HP) ports.

  3. Select themmultandmaddfunctions to be implemented in hardware. Clean and build the project using the Debug configuration (selected by default).

Monitor the Standalone Instrumented System

To monitor the system, do the following:

  1. After the build completes, connect the board to your computer, and power up the board.
  2. Click theRun>Debug Configurationsbutton to open the window.
  3. SelectXilinx SDx Application Debuggerin the Debug configurations tree.
  4. ClickNew Launch Configurationto create a newSDxApplication Debugger configuration.
  5. For Debug Type, selectStandalone Application Debug.
  6. For Connection, selectLocal.
  7. Select thePerformance Analysischeck box in the Main tab.

    After you selectPerformance Analysis, the performance analysis options are populated automatically in the Main tab.

  8. ClickApply, and then clickDebug.

    If prompted to switch perspectives, clickYes.

  9. After the Debug Perspective opens, clickWindow>Perspective, and selectPerformance Analysisin the Open Perspective dialog, and clickOK.
  10. To resume the application, select the Debug tab, and clickResume.

    If prompted to switch perspectives, clickNoto stay on the Performance Analysis perspective.



Create a Linux Project and Implement APM

  1. Open theSDSoCenvironment, and create a new project/workspace using any platform or operating system selection. Select theMatrix Multiplication and Addition Template.
  2. InSDxApplication Project Settings, selectInsert AXI Performance Monitor.

    Enabling this option, and building the project adds the APM IP core to your hardware system. The APM IP uses a small amount of resources in the programmable logic. TheSDSoCenvironment connects the APM to the hardware/software interface ports, which are the GP and HP.

  3. Select themmultandmaddfunctions to be implemented in hardware.
  4. Clean and build the project using the Debug configuration, which is selected by default.

The following figure shows the APM Counter:



Monitor the Linux Instrumented System

  1. With the completed build, copy the contents of thesd_carddirectory onto a SD card, and boot Linux on the board.
  2. Connect the board to your computer using both UART and JTAG cables.
  3. Set up the Linux TCF agent target connection with the IP address of the board. See theSDK Helptopic on TCF for more information.
  4. Click theRun>Debug Configurationmenu to openDebug Configurations.
  5. SelectXilinx Sdx Application Debuggerin the Debug configuration tree.
  6. ClickNew Launch Configurationto create a new Application Debugger configuration.
  7. For Debug Type, selectLinux Application Debug.
  8. For Connection, selectLocal.
  9. Select thePerformance Analysischeck box.

    After the Performance Analysis check box is selected, all the performance analysis options is populated automatically in theMaintab.

  10. ClickApply.
  11. ClickDebug.

    If prompted to switch perspective to Debug, clickYes.

  12. After Debug Perspective is displayed, clickWindow>Perspective, and then selectPerformance Analysisin the Open Perspective dialog.
  13. ClickOK.
  14. Select theDebugtab, and click onResumeto resume the application.

    If prompted to switch perspectives, clickNoto stay on the Performance Analysis perspective.

  15. After your program completes execution, click theStop Analysisbutton.

    If prompted by the Confirm Perspective Switch dialog box to stay in the Performance Analysis perspective, clickNo.

  16. Scroll through the analysis plots in the lower portion of the perspective to view different performance statistics.
  17. Click in any plot area to show a bigger version in the middle of the perspective.

    The orange box allows you to focus on a particular time slice of data.

Analyzing the Performance

In this system, the APM is connected to the one port in use between the PS, PL, and the GP port.

Figure:APM Performance Count

The multiplier and adder accelerator cores are both connected to the accelerator coherency port (ACP) for data input and output.

The GP port is used to issue control commands and get the status of the accelerator cores only, not for data transfer. The blue Slot 0 is connected to the GP port, and the green Slot 1 is connect to the ACP.

Note:The ACP port is not supported on Zynq UltraScale+ MPSoCdevice for SDSoCenvironment flows.

The APM is configured in Profile mode with two monitoring slots for each port (ACP and GP). Profile mode provides event counting functionality for each slot. The type of statistics computed by the APM for both reading and writing include:

Transaction Count
Total number of requests that occur on the bus.
Byte Counter
Total number of bytes sent (used for write throughput calculation).
Latency
Time between the start of the address issuance and the last element sent.

The latency and byte counter statistics are used by the APM to automatically compute the throughput (in MB/s). The latency and throughput values shown are for the 50 ms time interval.

Minimum, maximum, and averages also display for latency and throughput statistics.