Optimal Border Pixel Convolution

The final step in the algorithm is to replicate the edge pixels into the border region. To ensure the constant flow of data and data reuse, the algorithm makes use of local caching. The following figure shows how the border samples are aligned into the image.

Each sample is read from the vconv output from the vertical convolution.
The sample is then cached as one of four possible pixel types.
The sample is then written to the output stream.

The code for determining the location of the border pixels is shown here.

// Border pixels pvconv=vconv_buffer; // set/reset pointer to start of buffer Border:for (int i = 0; i < height; i++) { for (int j = 0; j < width; j++) { T pix_in, l_edge_pix, r_edge_pix, pix_out; #pragma HLS PIPELINE if (i == 0 || (i > border_width && i < height - border_width)) { // read a pixel out of the video stream and cache it for // immediate use and later replication purposes if (j < width - (K - 1)) {pix_in = *pvconv++;borderbuf[j] = pix_in; } if (j == 0) { l_edge_pix = pix_in; } if (j == width - K) { r_edge_pix = pix_in; } } // Select output value from the appropriate cache resource if (j <= border_width) { pix_out = l_edge_pix; } else if (j >= width - border_width - 1) { pix_out = r_edge_pix; } else { pix_out = borderbuf[j - border_width]; }*dst++=pix_out;} }

A notable difference with this new code is the extensive use of conditionals inside the tasks. This allows the task, after it is pipelined, to continuously process data. The result of the conditionals does not impact the execution of the pipeline. The result will impact the output values, but the pipeline with keep processing as long as input samples are available.