Profiling and Instrumenting Code to Measure Performance
The first major task in creating a software-defined SoC is to identify portions of application code that are suitable for implementation in hardware, and that significantly improve overall performance when run in hardware. Program hot-spots that are compute-intensive are good candidates for hardware acceleration, especially when it is possible to stream data between hardware and the CPU and memory to overlap the computation with the communication. Software profiling is a standard way to identify the most CPU-intensive portions of your program.
The SDSoC environment includes all performance and profiling capabilities that are included in the Xilinx SDK, includinggprof,
the non-intrusive Target Communication Framework (TCF) Profiler, and the Performance Analysis perspective within Eclipse.
- Set the active build configuration toSDDebugby right-clicking on the project in the Project Explorer and selecting .
- In theSDSoC Project Overviewwindow, click onDebug application.
Note:The board must be connected to your computer and powered on. The application automatically breaks at the entry to
main()
. - Launch the TCF Profiler by selecting .
- Start the TCF Profiler by clicking on the greenStartbutton at the top of theTCF Profilertab. EnableAggregate per functionin theProfiler Configurationdialog box.
- Start the profiling by clicking on theResumebutton. The program runs to completion and breaks at the
exit()
function. - View the results in theTCF Profilertab.
Profiling provides a statistical method for finding hot spots based on sampling the CPU program counter and correlating to the program in execution. Another way to measure program performance is to instrument the application to determine the actual duration between different parts of a program in execution.
Thesds_lib
library included in the SDSoC environment provides a simple, source code annotation based time-stamping API that can be used to measure application performance.
/* * @return value of free-running 64-bit Zynq(TM) global counter */ unsigned long long sds_clock_counter(void);
class perf_counter { public: uint64_t tot, cnt, calls; perf_counter() : tot(0), cnt(0), calls(0) {}; inline void reset() { tot = cnt = calls = 0; } inline void start() { cnt = sds_clock_counter(); calls++; }; inline void stop() { tot += (sds_clock_counter() - cnt); }; inline uint64_t avg_cpu_cycles() { return (tot / calls); }; }; extern void f(); void measure_f_runtime() { perf_counter f_ctr; f_ctr.start(); f() f_ctr.stop(); std::cout << "Cpu cycles f(): " << f_ctr.avg_cpu_cycles() << std::endl; }
The performance estimation feature within the SDSoC environment employs this API by automatically instrumenting functions selected for hardware implementation, measuring actual run-times by running the application on the target, and then comparing actual times with estimated times for the hardware functions.