Michael Abrash's Graphics Programming Black Book Special Edition: Zenning and the Flexible Mind

LISTING 22.5 L22-5.ASM

ClearS         proc near
      push     bp                       ;save caller’s BP
      mov      bp,sp                    ;point to stack frame
      cmp      word ptr [bp].BufSeg,0   ;skip the fill if a null
      jne      Start                    ; pointer is passed
      cmp      word ptr [bp].BufOfs,0
      je       Bye
Start: cld                              ;make STOSW count up
      mov     ah,byte ptr [bp].Attrib[1];load AH with attribute
      mov      al,byte ptr [bp].Filler  ;load AL with fill char
      les      di,dword ptr [bp].BufOfs ;load ES:DI with target buffer segment:offset
      mov      cx,[bp].BufSize          ;load CX with buffer size
      rep      stosw                    ;fill the buffer
Bye:
      pop      bp                       ;restore caller’s BP
      ret      EndMrk-RetAddr-2         ;return, clearing the parms from the stack
ClearS         endp

(We could get rid of yet another instruction by having the calling code pack both the attribute and the fill value into the same word, but that’s not part of the specification for this particular routine.)

Another nifty instruction-rearrangement trick saves 6 more bytes. ClearS checks to see whether the far pointer is null (zero) at the start of the routine...then loads and uses that same far pointer later on. Let’s get that pointer into registers and keep it there; that way we can check to see whether it’s null with a single comparison, and can use it later without having to reload it from memory. This technique is shown in Listing 22.6.

LISTING 22.6 L22-6.ASM

ClearS         proc near
      push     bp                       ;save caller’s BP
      mov      bp,sp                    ;point to stack frame
      les      di,dword ptr [bp].BufOfs ;load ES:DI with target buffer;segment:offset
      mov      ax,es                    ;put segment where we can test it
      or       ax,di                    ;is it a null pointer?
      je       Bye                      ;yes, so we’re done
Start: cld                              ;make STOSW count up
      mov     ah,byte ptr [bp].Attrib[1];load AH with attribute
      mov      al,byte ptr [bp].Filler  ;load AL with fill char
      mov      cx,[bp].BufSize          ;load CX with buffer size
      rep      stosw                    ;fill the buffer
Bye:
      pop      bp                       ;restore caller’s BP
      ret      EndMrk-RetAddr-2         ;return, clearing the parms from the stack
ClearS         endp

Well. Now we’re down to 28 bytes, having reduced the size of this subroutine by nearly 50 percent. Only 13 instructions remain. Realistically, how much smaller can we make this code?

About one-third smaller yet, as it turns out—but in order to do that, we must stretch our minds and use the 8088’s instructions in unusual ways. Let me ask you this: What do most of the instructions in the current version of ClearS do?

They either load parameters from the stack frame or set up the registers so that the parameters can be accessed. Mind you, there’s nothing wrong with the stack-frame-oriented instructions used in ClearS; those instructions access the stack frame in a highly efficient way, exactly as the designers of the 8088 intended, and just as the code generated by a high-level language would. That means that we aren’t going to be able to improve the code if we don’t bend the rules a bit.

Let’s think...the parameters are sitting on the stack, and most of our instruction bytes are being used to read bytes off the stack with BP-based addressing...we need a more efficient way to address the stack...the stack...THE STACK!

Ye gods! That’s easy—we can use the stack pointer to address the stack rather than BP. While it’s true that the stack pointer can’t be used for mod-reg-rm addressing, as BP can, it can be used to pop data off the stack—and POP is a one-byte instruction. Instructions don’t get any shorter than that.

There is one detail to be taken care of before we can put our plan into action: The return address—the address of the calling code—is on top of the stack, so the parameters we want can’t be reached with POP. That’s easily solved, however—we’ll just pop the return address into an unused register, then branch through that register when we’re done, as we learned to do in Chapter 14. As we pop the parameters, we’ll also be removing them from the stack, thereby neatly avoiding the need to discard them when it’s time to return.

With that problem dealt with, Listing 22.7 shows the Zenned version of ClearS.

LISTING 22.7 L22-7.ASM

ClearS         procnear
      pop      dx                  ;get the return address
      pop      ax                  ;put fill char into AL
      pop      bx                  ;get the attribute
      mov      ah,bh               ;put attribute into AH
      pop      cx                  ;get the buffer size
      pop      di                  ;get the offset of the buffer origin
      pop      es                  ;get the segment of the buffer origin
      mov      bx,es               ;put the segment where we can test it
      or       bx,di               ;null pointer?
      je       Bye                 ;yes, so we’re done
      cld                          ;make STOSW count up
      rep      stosw               ;do the string store
Bye:
      jmp      dx                  ;return to the calling code
ClearS         endp

At long last, we’re down to the bare metal. This version of ClearS is just 19 bytes long. That’s just 37 percent as long as the original version, without any change whatsoever in the functionality that ClearS makes available to the calling code. The code is bound to run a bit faster too, given that there are far fewer instruction bytes and fewer memory accesses.

All in all, the Zenned version of ClearS is a vast improvement over the original. Probably not the best possible implementation—never say never!—but an awfully good one.

Table of Contents