idea53 does not work on Linux x86 systems under gcc. Floating-point operations on the x86 architecture are affected by a hidden global variable called ``precision control,'' which normally specifies either 53-bit rounding or 64-bit rounding. Linux sets 64-bit rounding by default; gcc does not set 53-bit rounding before double floating-point operations. (gcc does, however, use 53-bit rounding for double spills to memory. The result is a mishmash of 53-bit rounding and 64-bit rounding, depending on which numbers are stored in registers, which in turn depends on the compiler optimization level.)
When should I use it? There is no reason to use idea53. It is a starting point for people who want to understand how sparc and powerpc work.
When should I use it? sparc is designed for the SPARC processor family, particularly the UltraSPARC. It works reasonably well on any RISC chip with 32 floating-point registers.
What do I need for top performance? gcc -O1 -mcpu=ultrasparc. Higher optimization levels (for example, gcc -O3 -mcpu=ultrasparc -fschedule-insns -fschedule-insns2) produce slower code. Sun's compiler (for example, cc -fast -dalign -xO3 -xarch=v8plus -Dinline=) also produces slower code.
When should I use it? powerpc is designed for POWER/PowerPC processors such as the 7410, 7450, and RS64-III. It is scheduled for a mix of the 7410 and 7450.
What do I need for top performance? gcc -O2 -mcpu=powerpc, an assembler that understands 8-byte alignment, and a linker that understands 8-byte alignment.
You don't need to use -O1. powerpc uses some gcc-style PowerPC asm for floating-point operations, with heavy enough use of volatile that gcc's optimizer is unable to screw up the instruction scheduling.
idea64 relies on IEEE extended-precision (64-bit) floating-point arithmetic, which is not available on most non-x86 processors. It also uses some gcc-style x86 asm to set the x86 precision control.
When should I use it? There is no reason to use idea64. It is a starting point for people who want to understand how pentium and ppro work.
idea64 is much slower than pentium and ppro on every x86 chip I've tried, with every combination of gcc options I've tried. Feel free to point this out to anyone who claims that manual optimization is useless. I'm interested in hearing results from other compilers.
pentium labels register floating-point variables as double rather than long double, to work around some gcc instruction selection flaws, but this will fail if the double variables are ever spilled to memory. In particular, it fails with gcc -O3. (The instruction selection flaws were fixed in gcc 3, but most people don't have gcc 3 yet.)
When should I use it? pentium is designed for the original Pentium and the Pentium MMX. It is also the best choice if you want a single library version that produces reasonable performance on all x86 processors: pentium is about 1.1x slower than ppro on the Pentium III, while ppro is about 1.4x slower than pentium on the Pentium.
What do I need for top performance? gcc -O1 -fomit-frame-pointer -malign-double, an assembler that understands 8-byte alignment, and a linker that understands 8-byte alignment. You'll see roughly a 4x slowdown if the global variables in opt-pentium.c are not aligned properly; people see this all the time under OpenBSD, for example, because OpenBSD continues to ship with obsolete development tools.
When should I use it? ppro is designed for the Pentium Pro, Pentium II, and Pentium III. ppro is also the best choice for the Athlon and the Pentium 4.
What do I need for top performance? gcc -O1 -fomit-frame-pointer -malign-double, an assembler that understands 8-byte alignment, and a linker that understands 8-byte alignment.