Michael Abrash's Graphics Programming Black Book Special Edition: Aiming the 486

32-Bit Addressing Modes

The 386 and 486 both support 32-bit addressing modes, in which any register may serve as the base memory addressing register, and almost any register may serve as the potentially scaled index register. For example,

mov al,BaseTable[ecx+edx*4]

uses a perfectly valid 32-bit address, with the byte accessed being the one at the offset in DS pointed to by the sum of EDX times 4 plus the offset of BaseTable plus ECX. This is a very powerful memory addressing scheme, far superior to 8088-style 16-bit addressing, but it’s not without its quirks and costs, so let’s take a quick look at 32-bit addressing. (By the way, 32-bit addressing is not limited to protected mode; 32-bit instructions may be used in real mode, although each instruction that uses 32-bit addressing must have an address-size prefix byte, and the presence of a prefix byte costs a cycle on a 486.)

Any register may serve as the base register component of an address. Any register except ESP may also serve as the index register, which can be scaled by 1, 2, 4, or 8. (Scaling is very handy for performing lookups in arrays and tables.) The same register may serve as both base and index register, except for ESP, which can only be the base. Incidentally, it makes sense that ESP can’t be scaled; ESP presumably always points to a valid stack, and I can’t think of any reason you’d want to use the stack pointer times 2, 4, or 8 in an address. ESP is, by its nature, a base rather than index pointer.

That’s all there is to the functionality of 32-bit addressing; it’s very simple, much simpler than 16-bit addressing, with its sharply limited memory addressing register combinations. The costs of 32-bit addressing are a bit more subtle. The only performance cost (apart from the aforementioned 1-cycle penalty for using 32-bit addressing in real mode) is a 1-cycle penalty imposed for using an index register. In this context, you use an index register when you use a register that’s scaled, or when you use the sum of two registers to point to memory. MOV BL,[EBX*2] uses an index register and takes an extra cycle, as does MOV CL,[EAX+EDX]; MOV CL,[EAX+100H] is not indexed, however.

The other cost of 32-bit addressing is in instruction size. Old-style 16-bit addressing usually (except in a few special cases) uses one extra byte, which Intel calls the Mod-R/M byte, which is placed immediately after each instruction’s opcode to describe the memory addressing mode, plus 1 or 2 optional bytes of addressing displacement—that is, a constant value to add into the address. In many cases, 32-bit addressing continues to use the Mod-R/M byte, albeit with a different interpretation; in these cases, 32-bit addressing is no larger than 16-bit addressing, except when a 32-bit displacement is involved. For example, MOV AL, [EBX] is a 2-byte instruction; MOV AL, [EBX+10H] is a 3-byte instruction; and MOV AL, [EBX+10000H] is a 6-byte instruction.

Note that 1 and 4-byte displacements, but not 2-byte displacements, are supported for 32-bit addressing. Code size can be greatly improved by keeping stack frame variables within 128 bytes of EBP, and variables in pointed-to structures within 127 bytes of the start of the structure, so that displacements can be 1 rather than 4 bytes.

However, because 32-bit addressing supports many more addressing combinations than 16-bit addressing, the Mod-R/M byte can’t describe all the combinations. Therefore, whenever an index register (as described above) is involved, a second byte, the SIB byte, follows the Mod-R/M byte to provide additional address information. Consequently, whenever you use a scaled memory addressing register or use the sum of two registers to point to memory, you automatically add 1 cycle and 1 byte to that instruction. This is not to say that you shouldn’t use index registers when they’re needed, but if you find yourself using them inside key loops, you should see if it’s possible to move the index calculation outside the loop as, for example, in a loop like this:

LoopTop:
      add   ax,DataTable[ebx*2]
      inc   ebx
      dec   cx
      jnz   LoopTop

You could change this to the following for greater performance:

      add   ebx,ebx      ;ebx*2
LoopTop:
      add   ax,DataTable[ebx]
      add   ebxX,2
      dec   cx
      jnz   LoopTop
      shr   ebx,1 ;ebx*2/2

I’ll end this chapter with two more quirks of 32-bit addressing. First, as with 16-bit addressing, addressing that uses EBP as a base register both accesses the SS segment by default and always has a displacement of at least 1 byte. This reflects the common use of EBP to address a stack frame, but is worth keeping in mind if you should happen to use EBP to address non-stack memory.

Lastly, as I mentioned, ESP cannot be scaled. In fact, ESP cannot be an index register; it must be a base register. Ironically, however, ESP is the one register that cannot be used to address memory without the presence of an SIB byte, even if it’s used without an index register. This is an outcome of the way in which the SIB byte extends the capabilities of the Mod-R/M byte, and there’s nothing to be done about it, but it’s at least worth noting that ESP-based, non-indexed addressing makes for instructions that are a byte larger than other non-indexed addressing (but not any slower; there’s no 1-cycle penalty for using ESP as a base register) on the 486.

Table of Contents