AI Engine API User Guide (AIE) 2021.2
Fast Fourier Transform (FFT)

Overview

The AIE API offers theaie::fft_ditclass template that provides the building blocks to implement all the stages of the FFT decimation-in-time computation. This class template is parametrized with the data type, the radix to be used and the vectorization of the FFT stage to be computed. The resulting class defines a stage iterator that contains the state of the computation (pointers to the input and twiddle arrays), and a function that performs the decimation in time.

By default the input and output type are the same, but you can optionally specify a second type for a different output type.

The following snippet shows the sample implementation of a radix-2 FFT stage for any vectorization value.

template< unsignedVectorization>
voidradix2_dit( constcint32 * __restrict x,
constcint16 * __restrict tw,
unsignedn, unsignedshift_tw, unsigned shift, bool inv,
cint32 * __restrict y)
{
usingFFT = aie::fft_dit; // radix = 2, input_type = cint32, output_type = cint32 (output defaults to input)
FFT fft;
autoit_stage = fft.begin_stage(x, tw);
autoit_out0 = aie::begin_vector(y);
autoit_out1 = aie::begin_vector(y + n / 2);
for( intj = 0; j < n / (2 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const autoout = fft.dit(*it_stage++, shift_tw, shift, inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
}
}
auto inv(E a)
Definition:aie.hpp:6673
shift_bits< T, type_bits_v< T >, Elems > shift
Definition:shift.hpp:119
Definition:aie_declaration.hpp:85

Typedefs

template
using aie::fft_dit=detail::fft_dit< Vectorization, detail::fft_get_stage< Vectorization, Radix, T1, T2 >(), Radix, T1, T2 >
More...

Supported Fast Fourier Transform Modes

Supported FFT/IFFT modes
Data type Radix
c16b 2
c32b 2
c32b 4
cfloat 2
For a given radix R with n number of points, there will be R number of outputs spaced by n/R. The above example shows 2 outputs for a radix 2, for radix 4 we would have:
template< unsignedVectorization>
voidradix4_dit( constcint32 * __restrict x,
constcint16 * __restrict tw1,
constcint16 * __restrict tw2,
unsignedn, unsignedshift_tw, unsigned shift, bool inv,
cint32 * __restrict y)
{
usingFFT = aie::fft_dit;
FFT fft;
autoit_stage = fft.begin_stage(x, tw1, tw2);
autoit_out0 = aie::begin_restrict_vector(y);
autoit_out1 = aie::begin_restrict_vector(y + n / 4);
autoit_out2 = aie::begin_restrict_vector(y + 2*n / 4);
autoit_out3 = aie::begin_restrict_vector(y + 3*n / 4);
chess_report(FFT::out_vector_size);
for( intj = 0; j < n / (4 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const autoout = fft.dit(*it_stage++, shift_tw, shift, inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
*it_out2++ = out[2];
*it_out3++ = out[3];
}
}
However you can also use less with different incrementation schemes to achieve the same result. This approach is currently recommended for optimal performance on c32b radix 4 for Vectorization > 1 due to internal pointer requirements:
template< unsignedVectorization>
voidradix4_dit( constcint32 * __restrict x,
constcint16 * __restrict tw1,
constcint16 * __restrict tw2,
unsignedn, unsignedshift_tw, unsigned shift, bool inv,
cint32 * __restrict y)
{
usingFFT = aie::fft_dit;
FFT fft;
autoit_stage = fft.begin_stage(x, tw1, tw2);
autoit_out0 = aie::begin_restrict_vector(y);
autoit_out1 = aie::begin_restrict_vector(y + n / 2);
for( intj = 0; j < n / (4 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const autoout = fft.dit(*it_stage++, shift_tw, shift, inv);
*it_out0 = out[0]; it_out0 += n/16;
*it_out0 = out[1]; it_out0 += -(int)(n/16)+1;
*it_out1 = out[2]; it_out1 += n/16;
*it_out1 = out[3]; it_out1 += -(int)(n/16)+1;
}
}
Going to/from cint16/cint32 is also supported. Here is an example of a radix 2 downscaling from cint32 to cint16:
template< unsignedVectorization>
voidradix2_down_dit( const cint32_t* __restrict x,
const cint16_t* __restrict tw,
intn, intshift_tw, int shift, bool inv,
cint16_t* __restrict y)
{
usingFFT = aie::fft_dit;
FFT fft;
autoit_stage = fft.begin_stage(x, tw);
autoit_out0 = aie::begin_vector(y);
autoit_out1 = aie::begin_vector(y + n / 2);
for( intj = 0; j < n / (2 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const autoout = fft.dit(*it_stage++, shift_tw, shift, inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
}
}
cint16 cint16_t
Definition:types.hpp:197
cint32 cint32_t
Definition:types.hpp:198

Typedef Documentation

fft_dit

template
usingaie::fft_dit= typedefdetail::fft_dit(), Radix, T1, T2>

Type that encapsulates the functionality for decimation-in-time FFTs.

Template Parameters
Vectorization Vectorization of the FFT stage
Radix Number which selects the FFT radix.
T1 Type of the input elements.
T2 Type of the output elements, defaults to input type.