Optical Flow Memory Access Optimization

As noted earlier in the methodology, the first task is to optimize the transfer of data. In this case, because the system is to process steaming video where each sample is processed in consecutive order, the memory transfer optimization is to ensure the SDSoC environment understands that all accesses are sequential in nature.

This is performed by adding SDS pragmas before the function signatures, for all functions involved.

#pragma SDS data access_pattern(matB:SEQUENTIAL, pixStream:SEQUENTIAL)#pragma SDS data mem_attribute(matB:PHYSICAL_CONTIGUOUS)#pragma SDS data copy(matB[0:stride*height])void readMatRows (yuv_t *matB, pix_t* pixStream, int height, int width, int stride);#pragma SDS data access_pattern(pixStream:SEQUENTIAL, dst:SEQUENTIAL)#pragma SDS data mem_attribute(dst:PHYSICAL_CONTIGUOUS)#pragma SDS data copy(dst[0:stride*height])void writeMatRows (yuv_t* pixStream, yuv_t *dst, int height, int width, int stride);#pragma SDS data access_pattern(f0Stream:SEQUENTIAL, f1Stream:SEQUENTIAL)#pragma SDS data access_pattern(ixix_out:SEQUENTIAL, ixiy_out:SEQUENTIAL, iyiy_out:SEQUENTIAL)#pragma SDS data access_pattern(dix_out:SEQUENTIAL, diy_out:SEQUENTIAL)void computeSum(pix_t* f0Stream, pix_t* f1Stream, int* ixix_out, int* ixiy_out, int* iyiy_out, int* dix_out, int* diy_out, int height, int width);#pragma SDS data access_pattern(ixix:SEQUENTIAL, ixiy:SEQUENTIAL, iyiy:SEQUENTIAL)#pragma SDS data access_pattern(dix:SEQUENTIAL, diy:SEQUENTIAL)#pragma SDS data access_pattern(fx_out:SEQUENTIAL, fy_out:SEQUENTIAL)void computeFlow(int* ixix, int* ixiy, int* iyiy, int* dix, int* diy, float* fx_out, float* fy_out, int height, int width);#pragma SDS data access_pattern(fx:SEQUENTIAL, fy:SEQUENTIAL, out_pix:SEQUENTIAL)void getOutPix (float* fx, float* fy, yuv_t* out_pix, int height, int width, float clip_flowmag);

For the readMatRows and writeMatRows function arguments which interface to the processor, the memory transfers are specified to be sequential accesses from physically contiguous memory and the data should be copied to and from the hardware function, and not simply accessed from the accelerator. This ensures the data is copied efficiently:

  • Sequential: The data is transferred in the same sequential manner as it is processed. This type of transfer requires the least amount of hardware overhead for high data processing rates and means an area efficient datamover is used.
  • Contiguous: The data is accessed from contiguous memory. This ensures there is no scatter-gather overhead in the data transfer rate and an efficient fast hardware datamover is used. This directive is supported by the associatedscs_alloclibrary call in themain()function, which ensures data for these arguments is stored in contiguous memory.
  • Copy: The data is copied to and from the accelerator, negating the need for data accesses back to the CPU or DDR memory. Because pointers are used, the size of the data to be copied is specified.

For the remaining hardware functions, the data transfers are simply specified as sequential, allowing the most efficient hardware to be used to connect the functions in the PL fabric.