[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

How to cripple a PowerPC



>From a German computer magazine
(http://www.heise.de/ct/Artikel/96/11/200eng.htm):

=====
A further classic amongst scientific benchmarks is the fast Fourier
transformation (FFT). The so-called butterfly algorithm is always used
here. The benchmark program available for a complex FFT of 1024 values by
John Greene is highly optimised and works with a long
sequence of multiply/add operations in double accuracy. Processors with a
lot of floating point registers profit from it in particular - that of
course goes for the PowerPC with its 32 FP registers.

Plus there is the fact that Intel C compilers known to us have cunning
optimisation for register variables - but only for integers. For
floating point registers loading is generally back and forth - because of
its stack-like register it's markedly difficult for compilers to
optimise here. Actually, the PowerPCs completed the task set in five times
(PPC603e) or even twelve times (PPC604e) faster than the
Intel processors - the PowerTower 225 from Power Computing calculates the
1024 points FFT in a breath-taking 220 �s, whilst the
fastest PentiumPro needs 2.78 ms for it.

Although the FFT benchmark is extremely local, because it practically runs
completely in the processor-internal cache, the result for the
PowerStack from Motorola (with NT 4.0) of 0.95 ms looked 20% worse than for
the equivalent Performa 6400. With disassembling it
was shown that Microsoft's C++ compiler left 15 of the the PPC 603's
floating point registers unused; the Metrowerks compiler for
MacOS, on the other hand, used all of them except two.
=====



Eric Bennett ( [email protected] ; http://www.pobox.com/~ericb )
Cross-platform internet file format utilties at www.pobox.com/~ericb/xplat.html

The PowerTower Pro 225 from Power Computing calculates the 1024 points FFT
in a breath-taking 220 �s, whilst the fastest PentiumPro needs 2.78 ms for
it. . . . The result for the PowerStack from Motorola (with NT 4.0) of 0.95
ms looked 20% worse than for the equivalent Performa 6400. With
disassembling it was shown that Microsoft's C++ compiler left 15 of the the
PPC 603's floating point registers unused; the Metrowerks compiler for
MacOS, on the other hand, used all of them except two.
-Magazin Fur Computertechnik


-- 
This message comes to you as a service of the mph-humor list.  No
claims of real or perceived humor are offered.
Sumbissions:  [email protected] 
Information:  http://mph124.rh.psu.edu/~hunt/mph-humor.html