Michael Abrash's Graphics Programming Black Book Special Edition: Zenning and the Flexible Mind

LISTING 22.2 L22-2.ASM

ClearS        proc near
      push    bp                        ;save caller’s BP
      mov     bp,sp                     ;point to stack frame
      cmp     word ptr [bp].BufSeg,0    ;skip the fill if a null
      jne     Start                     ; pointer is passed
      cmp     word ptr [bp].BufOfs,0
      je      Bye
Start: cld                              ;make STOSW count up
      mov     ax,[bp].Attrib            ;load AX with attribute parameter
      and     ax,0ff00h                 ;prepare for merging with fill char
      mov     bx,[bp].Filler            ;load BX with fill char
      and      bx,0ffh                  ;prepare for merging with attribute
      or       ax,bx                    ;combine attribute and fill char
     mov       di,[bp].BufOfs           ;load DI with target buffer offset
     mov       es,[bp].BufSeg           ;load ES with target buffer segment
     mov       cx,[bp].BufSize          ;load CX with buffer size
     rep       stosw                    ;fill the buffer
Bye:
     pop       bp                       ;restore caller’s BP
     ret       EndMrk-RetAddr-2         ;return, clearing the parms from the stack
ClearS         endp

(The OnStack structure definition doesn’t change in any of our examples, so I’m not going clutter up this chapter by reproducing it for each new version of ClearS.)

Okay, loading ES and DI directly saves another four bytes. We’ve squeezed a total of 6 bytes—about 11 percent—out of ClearS. What next?

Well, LES would serve better than two MOV instructions for loading ES and DI as shown in Listing 22.3.

LISTING 22.3 L22-3.ASM

ClearS         proc near
      push     bp                       ;save caller’s BP
      mov      bp,sp                    ;point to stack frame
      cmp      word ptr [bp].BufSeg,0   ;skip the fill if a null
      jne      Start                    ; pointer is passed
      cmp      word ptr [bp].BufOfs,0
      je       Bye
Start: cld                              ;make STOSW count up
      mov      ax,[bp].Attrib           ;load AX with attribute parameter
      and      ax,0ff00h                ;prepare for merging with fill char
      mov      bx,[bp].Filler           ;load BX with fill char
      and      bx,0ffh                  ;prepare for merging with attribute
      or       ax,bx                    ;combine attribute and fill char
     les       di,dword ptr [bp].BufOfs ;load ES:DI with target buffer
                                        ;segment:offset
      mov      cx,[bp].BufSize          ;load CX with buffer size
      rep      stosw                    ;fill the buffer
Bye:
      pop      bp                       ;restore caller’s BP
      ret      EndMrk-RetAddr-2         ;return, clearing the parms from the stack
ClearS         endp

That’s good for another three bytes. We’re down to 43 bytes, and counting.

We can save 3 more bytes by clearing the low and high bytes of AX and BX, respectively, by using SUB reg8,reg8 rather than ANDing 16-bit values as shown in Listing 22.4.

LISTING 22.4 L22-4.ASM

ClearS         proc near
      push     bp                       ;save caller’s BP
      mov      bp,sp                    ;point to stack frame
      cmp      word ptr [bp].BufSeg,0   ;skip the fill if a null
      jne      Start                    ; pointer is passed
      cmp      word ptr [bp].BufOfs,0
      je       Bye
Start: cld                              ;make STOSW count up
      mov      ax,[bp].Attrib           ;load AX with      attribute parameter
      sub      al,al                    ;prepare for merging with fill char
      mov      bx,[bp].Filler           ;load BX with fill char
      sub      bh,bh                    ;prepare for merging with attribute
      or       ax,bx                    ;combine attribute and fill char
      les      di,dword ptr [bp].BufOfs ;load ES:DI with target buffer
                                        ;segment:offset
      mov      cx,[bp].BufSize          ;load CX with buffer size
      rep      stosw                    ;fill the buffer
Bye:
      pop      bp                       ;restore caller’s BP
      ret      EndMrk-RetAddr-2         ;return, clearing the parms from the stack
ClearS         endp

Now we’re down to 40 bytes—more than 20 percent smaller than the original code. That’s pretty much it for simple instruction optimizations. Now let’s look for instruction optimizations.

It seems strange to load a word value into AX and then throw away AL. Likewise, it seems strange to load a word value into BX and then throw away BH. However, those steps are necessary because the two modified word values are ORed into a single character/attribute word value that is then used to fill the target buffer.

Let’s step back and see what this code really does, though. All it does in the end is load one byte addressed relative to BP into AH and another byte addressed relative to BP into AL. Heck, we can just do that directly! Presto—we’ve saved another 6 bytes, and turned two word-sized memory accesses into byte-sized memory accesses as well. Listing 22.5 shows the new code.

Table of Contents