Hardware/Software System Runtime Operation

The SDSoC compilers implement hardware functions either by cross-compiling them into IP using the Vivado® HLS tool, or by linking them as C-Callable IP as described inIntroduction. Each hardware function callsite is rewritten to call a stub function that manages the execution of the hardware accelerator. The figure below shows an example of hardware function rewriting. The original user code is shown on the left. The code section on the right shows the hardware function calls rewritten with new function names.

Figure:Hardware Function Call Site Rewriting



The stub function initializes the hardware accelerator, initiates any required data transfers for the function arguments, and then synchronizes hardware and software by waiting at an appropriate point in the program for the accelerator and all associated data transfers to complete. If, for example, the hardware functionfoo()is defined infoo.cpp, you can view the generated rewritten code in_sds/swstubs/foo.cppfor the project build configuration. As an example, the stub code below replaces a user function marked for hardware. This function starts the accelerator, starts data transfers to and from the accelerator, and waits for those transfers to complete.

void _p0_mmult0(float *A, float *B, float *C) { switch_to_next_partition(0); int start_seq[3]; start_seq[0] = 0x00000f00; start_seq[1] = 0x00010100; start_seq[2] = 0x00020000; cf_send_i(cmd_addr,start_seq,cmd_handle); cf_wait(cmd_handle); cf_send_i(A_addr, A, A_handle); cf_send_i(B_addr, B, B_handle); cf_receive_i(C_addr, C, C_handle); cf_wait(A_handle); cf_wait(B_handle); cf_wait(C_handle);

Event tracing provides visibility into each phase of the hardware function execution, including the software setup for the accelerators and data transfers, as well as the hardware execution of the accelerators and data transfers. For example, the stub code below is instrumented for trace. Each command that starts the accelerator, starts a transfer, or waits for a transfer to complete is instrumented.

void_p0_mmult_0(float *A, float *B, float *C) { switch_to_next_partition(0); int start_seq[3]; start_seq[0] = 0x00000f00; start_seq[1] = 0x00010100; start_seq[2] = 0x00020000; sds_trace(EVENT_START); cf_send_i(cmd_addr,start_seq,cmd_handle); sds_trace(EVENT_STOP); sds_trace(EVENT_START); cf_wait(cmd_handle); sds_trace(EVENT_STOP); sds_trace(EVENT_START); cf_send_i(A_addr, A, A_handle); sds_trace(EVENT_STOP); sds_trace(EVENT_START); cf_send_i(B_addr, B, B_handle); sds_trace(EVENT_STOP); sds_trace(EVENT_START); cf_receive_i(C_addr, C, C_handle); sds_trace(EVENT_STOP); sds_trace(EVENT_START); cf_wait(A_handle); sds_trace(EVENT_STOP); sds_trace(EVENT_START); cf_wait(B_handle); sds_trace(EVENT_STOP); sds_trace(EVENT_START); cf_wait(C_handle); sds_trace(EVENT_STOP);