1Note that the UNROLL option makes the 'inner' des loop unroll all 16 rounds
2instead of the default 4.
3RISC1 and RISC2 are 2 alternatives for the inner loop and
4PTR means to use pointers arithmatic instead of arrays.
5
6FreeBSD - Pentium Pro 200mhz - gcc 2.7.2.2 - assembler		577,000 4620k/s
7IRIX 6.2 - R10000 195mhz - cc (-O3 -n32) - UNROLL RISC2 PTR	496,000 3968k/s
8solaris 2.5.1 usparc 167mhz?? - SC4.0 - UNROLL RISC1 PTR [1]	459,400 3672k/s
9FreeBSD - Pentium Pro 200mhz - gcc 2.7.2.2 - UNROLL RISC1	433,000 3468k/s
10solaris 2.5.1 usparc 167mhz?? - gcc 2.7.2 - UNROLL 		380,000 3041k/s
11linux - pentium 100mhz - gcc 2.7.0 - assembler			281,000 2250k/s
12NT 4.0 - pentium 100mhz - VC 4.2 - assembler			281,000 2250k/s
13AIX 4.1? - PPC604 100mhz - cc - UNROLL 				275,000 2200k/s
14IRIX 5.3 - R4400 200mhz - gcc 2.6.3 - UNROLL RISC2 PTR		235,300 1882k/s
15IRIX 5.3 - R4400 200mhz - cc - UNROLL RISC2 PTR			233,700 1869k/s
16NT 4.0 - pentium 100mhz - VC 4.2 - UNROLL RISC1 PTR		191,000 1528k/s
17DEC Alpha 165mhz??  - cc - RISC2 PTR [2]			181,000 1448k/s
18linux - pentium 100mhz - gcc 2.7.0 - UNROLL RISC1 PTR		158,500 1268k/s
19HPUX 10 - 9000/887 - cc - UNROLL [3]	 			148,000	1190k/s
20solaris 2.5.1 - sparc 10 50mhz - gcc 2.7.2 - UNROLL		123,600  989k/s
21IRIX 5.3 - R4000 100mhz - cc - UNROLL RISC2 PTR			101,000  808k/s
22DGUX - 88100 50mhz(?) - gcc 2.6.3 - UNROLL			 81,000  648k/s
23solaris 2.4 486 50mhz - gcc 2.6.3 - assembler			 65,000  522k/s
24HPUX 10 - 9000/887 - k&r cc (default compiler) - UNROLL PTR	 76,000	 608k/s
25solaris 2.4 486 50mhz - gcc 2.6.3 - UNROLL RISC2		 43,500  344k/s
26AIX - old slow one :-) - cc -					 39,000  312k/s
27
28Notes.
29[1] For the ultra sparc, SunC 4.0 
30    cc -xtarget=ultra -xarch=v8plus -Xa -xO5, running 'des_opts'
31    gives a speed of 344,000 des/s while 'speed' gives 459,000 des/s.
32    I'll record the higher since it is coming from the library but it
33    is all rather weird.
34[2] Similar to the ultra sparc ([1]), 181,000 for 'des_opts' vs 175,000.
35[3] I was unable to get access to this machine when it was not heavily loaded.
36    As such, my timing program was never able to get more that %30 of the CPU.
37    This would cause the program to give much lower speed numbers because
38    it would be 'fighting' to stay in the cache with the other CPU burning
39    processes.
40