By default the input and output type are the same, but you can optionally specify a second type for a different output type.
The following snippet shows the sample implementation of a radix-2 FFT stage for any vectorization value.
template<
unsignedVectorization>
voidradix2_dit(
constcint32 * __restrict x,
constcint16 * __restrict tw,
unsignedn,
unsignedshift_tw,
unsigned
shift,
bool
inv,
cint32 * __restrict y)
{
FFT fft;
autoit_stage = fft.begin_stage(x, tw);
autoit_out0 = aie::begin_vector(y);
autoit_out1 = aie::begin_vector(y + n / 2);
for(
intj = 0; j < n / (2 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const
autoout = fft.dit(*it_stage++, shift_tw,
shift,
inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
}
}
auto inv(E a)
Definition:aie.hpp:6673
shift_bits< T, type_bits_v< T >, Elems > shift
Definition:shift.hpp:119
Definition:aie_declaration.hpp:85
Supported FFT/IFFT modes
Data type |
Radix |
c16b |
2 |
c32b |
2 |
c32b |
4 |
cfloat |
2 |
For a given radix R with n number of points, there will be R number of outputs spaced by n/R. The above example shows 2 outputs for a radix 2, for radix 4 we would have:
template<
unsignedVectorization>
voidradix4_dit(
constcint32 * __restrict x,
constcint16 * __restrict tw1,
constcint16 * __restrict tw2,
unsignedn,
unsignedshift_tw,
unsigned
shift,
bool
inv,
cint32 * __restrict y)
{
FFT fft;
autoit_stage = fft.begin_stage(x, tw1, tw2);
autoit_out0 = aie::begin_restrict_vector(y);
autoit_out1 = aie::begin_restrict_vector(y + n / 4);
autoit_out2 = aie::begin_restrict_vector(y + 2*n / 4);
autoit_out3 = aie::begin_restrict_vector(y + 3*n / 4);
chess_report(FFT::out_vector_size);
for(
intj = 0; j < n / (4 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const
autoout = fft.dit(*it_stage++, shift_tw,
shift,
inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
*it_out2++ = out[2];
*it_out3++ = out[3];
}
}
However you can also use less with different incrementation schemes to achieve the same result. This approach is currently recommended for optimal performance on c32b radix 4 for Vectorization > 1 due to internal pointer requirements:
template<
unsignedVectorization>
voidradix4_dit(
constcint32 * __restrict x,
constcint16 * __restrict tw1,
constcint16 * __restrict tw2,
unsignedn,
unsignedshift_tw,
unsigned
shift,
bool
inv,
cint32 * __restrict y)
{
FFT fft;
autoit_stage = fft.begin_stage(x, tw1, tw2);
autoit_out0 = aie::begin_restrict_vector(y);
autoit_out1 = aie::begin_restrict_vector(y + n / 2);
for(
intj = 0; j < n / (4 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const
autoout = fft.dit(*it_stage++, shift_tw,
shift,
inv);
*it_out0 = out[0]; it_out0 += n/16;
*it_out0 = out[1]; it_out0 += -(int)(n/16)+1;
*it_out1 = out[2]; it_out1 += n/16;
*it_out1 = out[3]; it_out1 += -(int)(n/16)+1;
}
}
Going to/from cint16/cint32 is also supported. Here is an example of a radix 2 downscaling from cint32 to cint16:
template<
unsignedVectorization>
voidradix2_down_dit(
const
cint32_t* __restrict x,
{
FFT fft;
autoit_stage = fft.begin_stage(x, tw);
autoit_out0 = aie::begin_vector(y);
autoit_out1 = aie::begin_vector(y + n / 2);
for(
intj = 0; j < n / (2 * FFT::out_vector_size); ++j)
chess_prepare_for_pipelining
chess_loop_range(1,)
{
const
autoout = fft.dit(*it_stage++, shift_tw,
shift,
inv);
*it_out0++ = out[0];
*it_out1++ = out[1];
}
}
cint16 cint16_t
Definition:types.hpp:197
cint32 cint32_t
Definition:types.hpp:198