Appendix D - Comparison test

I have done some tests on K6-2 (0.25), Athlon (0.18) and Pentium III (0.25). The tests do geometry transformation. The results are below. The 3DNow! code was optimised for K6-2, and it is equivalent to SSE code. 1k means that 1000 vectors were processed, 100k - 100,000 vectors. - means that prefetching wasn't used, + means that the optimal prefetching was used (obtained experimentally). The code was run on Win9x. On WinNT the results are much better, but I was unable to test all variants there. The results are in cycles per vector, and are approximate.


	1k -	1k +	100k -	100k +
K6-2	35	35	72	74
Athlon	29	28	70	52
Pentium III	55	46	112	83

Athlon (WinNT)	19	20	55	33

One thing we notice is that performance increases significantly on WinNT (much better task management mechanism).

Another is that prefetching on K6-2 didn't help at all (probably a poor mobo). I have tried many different prefetch distances, and nothing has changed.

Interesting that Pentium III performs worse than Athlon and even than K6-2. What's more, I have tried multiple prefetch distances on it (what isn't covered in the results), and the larger the distance was, the slower it executed - this didn't happened on AMD processors.

The influence of prefetching in the 1000-vertice test on Pentium III could result from its tiny 16KB L1 data cache - 1000 vectors occupy 16000 bytes. K6-2 has 32KB of L1 data cache and Athlon has 64KB, thus on these processors there is no impact with such a small portion of data.