Appendix A - 3DNow! instructions summary

Where possible, the notation dest = op (dest, src) applies to both floats residing in an MMX register; the operation is executed in parallel on them. The bold instructions were introduced on Athlon.

SIMD-FP solution
pi2fd dest = float(dwordsrc)
pf2id dest = dword(floatsrc)
pi2fw dest[63..32] = float(wordsrc[47..32])
dest[32..0] = float(wordsrc[15..0])
pf2iw dest[63..48] = 0
dest[47..32] = word(floatsrc[63..32])
dest[31..16] = 0
dest[15..0] = word(floatsrc[31..0])
pfacc dest.hi = src.hi + src.lo, dest.lo = dest.hi + dest.lo
pfnacc dest.hi = src.hi - src.lo, dest.lo = dest.hi - dest.lo
pfpnacc dest.hi = src.hi + src.lo, dest.lo = dest.hi - dest.lo
pfadd dest = dest + src
pfsub dest = dest - src
pfsubr dest = src - dest
pfcmpeq dest = (dest == src) ? 0xFFFFFFFF : 0
pfcmpge dest = (dest >= src) ? 0xFFFFFFFF : 0
pfcmpgt dest = (dest > src) ? 0xFFFFFFFF : 0
pfmin dest = min (dest, src)
pfmax dest = max (dest, src)
pfmul dest = dest * src
pfrcp dest.hi = dest.lo = approx15(1/src.lo)
pfsqrt dest.hi = dest.lo = approx15(1/sqrt(src.lo))
pfrcpit1 first iteration of reciprocal approximation
pfrcpit2 second it. of reciprocal and recip. sqrt approx.
pfrsqit1 first it. of recip. sqrt approx.
Extensions to MMX
pavgusb dest = average (dest, src) (on unsigned bytes)
pmulhrw used instead of pmulhw, for fixed point math
pswapd dest.hi = src.lo, dest.lo = src.hi
femms fast empty MMX state
prefetch prefetch data to L1 cache
prefetchw on current processors does not differ from prefetch