Stereo Vision Memory Access Optimization

After importing the pre-optimized hardware function into a project in the SDSoC environment, the first task is to remove any interface optimizations. The interface between the PS and the hardware function is automatically managed and optimized based on the data types of the hardware function and the data access. Refer toData Motion Optimization.

  • Remove any INTERFACE directives present in the hardware function.
  • Remove any DATA_PACK directives that reference variables present in the hardware function argument list.
  • Remove any Vivado HLS hardware data types by enclosing the top-level function in wrappers that only use native C/C++ types for the function arguments.

In this example, the functions to be accelerated are captured inside a single top-level hardware functionstereo_remap_bm.

int main() { unsigned char *inY = (unsigned char *)sds_alloc(HEIGHT*DUALWIDTH); unsigned short *inCY = (unsigned short *)sds_alloc(HEIGHT*DUALWIDTH*2); unsigned short *outCY = (unsigned short *)sds_alloc(HEIGHT*DUALWIDTH*2); unsigned char *outY = (unsigned char *)sds_alloc(HEIGHT*DUALWIDTH); // read double wide image from disk if (read_yuv_file(inY, DUALWIDTH, DUALWIDTH, HEIGHT, FILEINAME) != 0) return -1; convert_Y8toCY16(inY, inCY, HEIGHT*DUALWIDTH); stereo_remap_bm(inCY, outCY, HEIGHT, DUALWIDTH, DUALWIDTH); // write single wide image to disk convert_CY16toY8(outCY, outY, HEIGHT*DUALWIDTH); write_yuv_file(outY, DUALWIDTH, DUALWIDTH, HEIGHT, ONAME); // write single wide image to disk sds_free(inY); sds_free(inCY); sds_free(outCY); sds_free(outY); return 0; }

The key to optimizing the memory accesses to the hardware is to review the data types passed into the hardware function. Reviewing the function signature shows the key variables names to optimize: the input and output data streamsimg_data_lrandimg_data_disp.

int stereo_remap_bm( yuv_t *img_data_lr, yuv_t *img_data_disp, int height, int dual_width, int stride);

Because the data is transferred in a sequential manner, the first thing to ensure is that the access pattern is defined as SEQUENTIAL for both arguments. The next optimization is to ensure the data transfer is not interrupted by a scatter gather DMA operation specifying the memory_attribute to be PHYSICAL_CONTIGUOUS|NON_CACHEABLE. This also requires that the memory be allocated withsds_allocfromsds_lib.

#include "sds_lib.h"int main() { unsigned char *inY = (unsigned char *)sds_alloc(HEIGHT*DUALWIDTH); unsigned short *inCY = (unsigned short *)sds_alloc(HEIGHT*DUALWIDTH*2); unsigned short *outCY = (unsigned short *)sds_alloc(HEIGHT*DUALWIDTH*2); unsigned char *outY = (unsigned char *)sds_alloc(HEIGHT*DUALWIDTH); }

Finally, the copy directive is used to ensure the data is explicitly copied to the accelerator and that the data is not accessed from shared memory

#pragma SDS data access_pattern(img_data_lr:SEQUENTIAL) #pragma SDS data mem_attribute(img_data_lr:PHYSICAL_CONTIGUOUS|NON_CACHEABLE) #pragma SDS data copy(img_data_lr[0:stride*height]) #pragma SDS data access_pattern(img_data_disp:SEQUENTIAL) #pragma SDS data mem_attribute(img_data_disp:PHYSICAL_CONTIGUOUS|NON_CACHEABLE) #pragma SDS data copy(img_data_disp[0:stride*height]) int stereo_remap_bm( yuv_t *img_data_lr, yuv_t *img_data_disp, int height, int dual_width, int stride);

With these optimization directives, the memory access between the PS and PL is optimized for the most efficient transfers.