#include <fftr4.h>
real4 a[256];
fftr4_256(a);
fftr4_scale256(a);
fftr4_un256(a);
fftr4_256
computes a 256-point real-to-complex discrete Fourier transform.
It evaluates the polynomial
a[0] + a[2] x + a[4] x^2 + ... + a[254] x^127
+ a[1] x^128 + a[3] x^129 + ... + a[255] x^255
at all the 256th roots of 1 modulo conjugation,
and puts the values into a,
overwriting the input.
(Beware that the results are stored in an
unusual order;
also note the input interleaving.)
Each a[n] is a 4-byte real number.
To compute the inverse transform, reconstructing a polynomial from its values, call fftr4_scale256 and then fftr4_un256. (fftr4_scale256 multiplies a[0] and a[1] by 1/256, and each remaining a[n] by 1/128.)
Note that the position of a in memory can affect performance.
#include <fftr4.h>
real4 a[256];
real4 b[256];
fftr4_mul256(a,b);
fftr4_mul256 multiplies each
a[n] by
b[n]
and puts the result into
a[n].
The sequence of operations
fftr4_256(a);
fftr4_256(b);
fftr4_mul256(a,b);
fftr4_scale256(a);
fftr4_un256(a);
convolves
a with
b:
it multiplies the polynomial
a[0] + a[2] x + a[4] x^2 + ... + a[254] x^127
+ a[1] x^128 + a[3] x^129 + ... + a[255] x^255
by
b[0] + b[2] x + b[4] x^2 + ... + b[254] x^127
+ b[1] x^128 + b[3] x^129 + ... + b[255] x^255
modulo x^256-1
and puts the result back into a.
The sequence of operations
fftr4_256(b);
fftr4_scale256(b);
fftr4_256(a);
fftr4_mul256(a,b);
fftr4_un256(a);
has the same effect.
If you have many polynomials to multiply by the same b,
you can save time by reusing the transformed (and scaled) b.
#include <fftr8.h>
real8 a[256];
real8 b[256];
fftr8_256(a);
fftr8_scale256(a);
fftr8_un256(a);
fftr8_mul256(a,b);
The fftr8 functions are just like the fftr4 routines
except that they work with 8-byte floating-point numbers
instead of 4-byte floating-point numbers.
WARNING: Some compilers, notably gcc without the -malign-double option, do not guarantee 8-byte alignment for 8-byte floating-point variables. The Pentium, Pentium II, et al. will slow down dramatically if your arrays are not aligned to 8-byte boundaries.
#include <fftfreq.h>
unsigned int n;
unsigned int f;
f = fftfreq_r(n,256);
Here is what
fftr4_256 and fftr8_256
put into a,
in terms of the input polynomial p: