IA-64 Overview

Chris Dragan

All current processors in PCs execute the legacy x86 instruction set. Today they are in a dead end - improving this type of processors became really difficult for manufacturers.

Intel is today trying to create a new architecture, called by it IA-64. No one knows at the current stage whether processors in this architecture will be available for common citizens, but the fact is, that it is revolutionary not only compared to x86 processors, but even to the most powerful RISC processors.

Processor resources

The IA-64 contains a large number of registers. There are:
- 128 64-bit general purpose registers (32 static and 96 stacked),
- 128 82-bit floating point registers (32 static and 96 rotating),
- 64 1-bit predicate registers,
- 8 64-bit branch registers,
and several specialized registers, including instruction pointer, current frame marker, application registers, performance monitoring data registers, user mask and processor identifiers.

Stacked registers

The first 32 general purpose registers (r0..r31) can be used for typical computations. The 96 stacked registers are used for storing function parameters, return values and local variables. This approach replaces a common stack with a more effective register windows. The register windows are not fixed, like on most RISC processors, but have variable, function-specific widths.

A simple example of what is happening to register contents when calling a function:


	    Before Call 		    After Call
       ┌───────┬────────────┐	      ┌───────┬────────────┐
   r32 │ 01DFh │	    │	  r32 │ 8153h │ 	   │
   r33 │ 5DDDh │ input	    │ ┌─> r33 │ 5931h │ input	   │
   r34 │ 1880h │	    │ │   r34 │ FFEFh │ 	   │
       ├───────┼────────────┤ │   r35 │ 7856h │ 	   │
   r35 │ 4315h │ local	    │ │       ├───────┼────────────┤
   r36 │ 5315h │	    │ │   r36 │ ????  │ local	   │
       ├───────┼────────────┤ │       ├───────┼────────────┤
   r37 │ 8153h │	    │ │   r37 │ ????  │ output	   │
   r38 │ 5931h │ output     ├─┘   r38 │ ????  │ 	   │
   r39 │ FFEFh │	    │	      ├───────┴────────────┤
   r40 │ 7856h │	    │	      │other regs not avail│
       ├───────┴────────────┤	      └────────────────────┘
       │other regs not avail│
       └────────────────────┘

The caller's input and local registers are renamed and become invisible for to the callee. Instead, the function gets the output registers of its predecessor as input registers (function arguments) and after using 'alloc' instruction receives its own local and output registers. A circuit, called RSE (Register Stack Engine), stores in background contents of renamed registers of the previous function into memory.

Speculation and predication

Two another important features of IA-64, both enhance multi-path execution.

Predication is the conditional execution of instructions. As there is no flags register, the results of compare instructions are placed in 1-bit predicate registers. These registers can be used to determine whether to execute or not the next instructions. An example:


	fcmp.ge.unc	p3,p4=r5,r6	; p3=(r5>=r6), p4=!p3
   (p3) add		r2=r2,r1	; if(p3) r2 += r1
   (p4) add		r2=2,r1 	; if(p4) r2 = r1 + 2

Speculation allows removing loads out of time-critical paths. It has a lot in common with prefetching introduced in PIII and K7. Data can be loaded a long time before it is needed, so the load latency is minimal.

Branches

All branches are predicted. The user has full control over branch prediction; he can force prediction type of static predictor and select what type of branch will be taken. Indirect branches use branch registers. There are also counted loops with default "fall-through" or "branch" prediction. Function calls are also a type of branches and return addresses reside in branch registers.

The following table illustrates types of branches:
conditional A conditional branch, that bases on a selected predicate register content. Predicate register p0 always contains a 1 and can be used for "unconditional" branches. Target of the conditional branch can be either IP relative or indirect (in a branch register).
conditional call Similar to conditional branch, but causes renaming of stacked registers.
conditional return Return is also conditional. Return target is taken from a branch register.
ia A special, unconditional type of branch to IA-32 legacy instruction set.
cloop Counted loop. Similar to "loop" instruction on x86, but the loop counter resides in a special LoopCount application register.
ctop, cexit Counted loops with an additional epilog count.
wtop, wexit Similar to ctop and cexit, but conditional.

Instruction word processing - VLIW and EPIC

These two abbreviations, which Intel likes a lot, describe how the architecture loads and identifies the code. It is obvious that instruction words must be long, as there is a large amount of registers. As a semi-RISC processor (a load/store architecture), it must have fixed width instruction words.

So the IA-64 has fixed width, 128-bit (!) instruction words (Very Large Instruction Words - VLIW). Each word (also called bundle) contains three (or less, if there is a large immediate) instruction opcodes, each 41-bit wide.

Additionaly there is a 5-bit field - the template. The user describes in it which instructions in a bundle use which execution units (ALU, Memory, FP, Branch, Extended). The template also contains architectural stops. These stops indicate breaks between groups of independent instructions. So determination which instructions can be executed simultaneously and which cannot is left to the user. This speeds up process of decoding instructions, but requires additional coding skills from the programmer or compiler. (The feature is called Explicitly Parallel Instruction Computing - EPIC.)

Instruction set

The number of instructions is too big to include their descriptions in this article (sometimes I wonder if this is still a RISC). The instructions include typical arithmetic and logic instructions, shifts, memory access instructions, branches, parallel arithmetic instructions (similar to MMX but more flexible), floating point instructions on entire registers and on 32-bit pairs (like in 3DNow!), and some other.

More info

If you are curious about the IA-64, go get Application Developer's Architecture Guide from ftp://download.intel.com/design/ia-64/downloads/adag.pdf. It contains description of all instructions, as well as some detailed explanation of processor features, plus optimization manual. However, you will not find there any information about system programming, i.e. memory management, interrupt handling or task management. You may also want to search www.intel.com for some newer information.

Chris Dragan