D. J. Bernstein
Fast arithmetic
djbfft
The art of FFT benchmarking
Frigo and Johnson announced the ``Fastest Fourier Transform in the West''
in 1997, claiming that it was
``typically faster than all other publicly available DFT software''
and ``competitive with vendor-optimized FFTs,
as demonstrated by our extensive benchmarks.''
Similar claims remained on the FFTW web pages until at least October 1999.
Unfortunately,
the Frigo-Johnson benchmarks are highly deceptive:
- FFTW has never been competitive with the state of the art
on the Pentium or Pentium MMX.
Frigo and Johnson did not include any Pentium-optimized libraries
in their original benchmarks;
they removed the Pentium and Pentium MMX from their benchmark pages in 1998.
- djbfft (starting with version 0.60 in 1997)
is faster than FFTW on the Pentium Pro, Pentium II, and Pentium III.
However,
when Frigo and Johnson added djbfft to their benchmarks in 1998,
they slowed it down by changing the compiler options
despite my explicit instructions.
I have asked them repeatedly to remove djbfft from their benchmarks
if they aren't willing to follow the installation instructions;
they have ignored my requests.
- djbfft is a work in progress;
version 0.70 was released in 1998,
and version 0.75 in 1999.
The current version is substantially faster than FFTW on the UltraSPARC.
(In fact, as far as I know,
the current version is competitive with FFTW on every common CPU.)
But Frigo and Johnson report results for djbfft 0.60.
I have asked them repeatedly to remove djbfft from their benchmarks
if they aren't willing to keep up with the latest version;
they have ignored my requests.
How many other high-performance libraries
have been excluded from the FFTW benchmarks,
or slowed down by the FFTW authors?
The FFTW benchmark methodology has several other problems:
- For complex FFTs,
the authors benchmark either the forward transform or the inverse transform,
and assume that the other transform runs at the same speed.
This is not always true for in-order transforms (including FFTW),
and it is rarely true for out-of-order transforms.
- For real FFTs,
the authors benchmark the forward transform plus the inverse transform.
However, in many applications,
the forward transform is invoked more often than the inverse transform.
- The authors combine multiple libraries on the same CPU into one graph.
However, they don't combine multiple compiler options into one graph.
This is inconsistent.
- The authors distinguish between in-order and out-of-order transforms,
but they don't distinguish between scaled and unscaled transforms.
This is inconsistent.
(The cynical explanation
is that FFTW produces results in order but doesn't scale.)
Format of results
Benchmarks should make it easy to answer three questions:
- ``How fast is this library for this computation?''
Answered by a table showing each library's time for each computation.
- ``How fast is the fastest library for this computation?''
Answered by a table showing the smallest time for each computation.
- ``What are the fastest libraries for this computation?''
Answered by a table showing each library's time divided by the best time.
The FFTW benchmark results are presented as graphs that
are much less useful than the above tables:
- The results are expressed as inverse time, rather than time.
Inverse time is unnecessarily difficult to use.
The time for a convolution, for example,
is a straightforward sum of transform times and multiplication times;
the inverse time, in contrast,
is the inverse of the sum of the inverse of the inverse times.
(The FFTW authors claim, incorrectly, that the fast FFTs are more ``huddled''
in a time graph than in an inverse-time graph.)
- The results are scaled by 5n lg n for a size-n transform.
This, too, makes the numbers more difficult to use.
Referring to the result as ``mflops'' is misleading:
a split-radix FFT uses only about 4n(lg n - 1.5) floating-point operations.
(The FFTW authors claim, incorrectly,
that scaling is essential to fit the results onto one graph.)
- The results for FFTW itself are represented as large, navy blue dots,
while the results for the next library
are represented as thin, light blue stars
that are almost invisible on a printout.
Is the goal here to provide information, or to advertise FFTW?