Previous Table of Contents Next

Chapter 22
Zenning and the Flexible Mind

Taking a Spin through What You’ve Learned

And so we come to the end of our journey; for now, at least. What follows is a modest bit of optimization, one which originally served to show readers of Zen of Assembly Language that they had learned more than just bits and pieces of knowledge; that they had also begun to learn how to apply the flexible mind—unconventional, broadly integrative thinking—to approaching high-level optimization at the algorithmic and program design levels. You, of course, need no such reassurance, having just spent 21 chapters learning about the flexible mind in many guises, but I think you’ll find this example instructive nonetheless. Try to stay ahead as the level of optimization rises from instruction elimination to instruction substitution to more creative solutions that involve broader understanding and redesign. We’ll start out by compacting individual instructions and bits of code, but by the end we’ll come up with a solution that involves the very structure of the subroutine, with each instruction carefully integrated into a remarkably compact whole. It’s a neat example of how optimization operates at many levels, some much less determininstic than others—and besides, it’s just plain fun.



In Jeff Duntemann’s excellent book Borland Pascal From Square One (Random House, 1993), there’s a small assembly subroutine that’s designed to be called from a Turbo Pascal program in order to fill the screen or a systemscreen buffer with a specified character/attribute pair in text mode. This subroutine involves only 21 instructions and works perfectly well; however, with what we know, we can compact the subroutine tremendously and speed it up a bit as well. To coin a verb, we can “Zen” this already-tight assembly code to an astonishing degree. In the process, I hope you’ll get a feel for how advanced your assembly skills have become.

Jeff’s original code follows as Listing 22.1 (with some text converted to lowercase in order to match the style of this book), but the comments are mine.

LISTING 22.1 L22-1.ASM

OnStack      struc      ;data that’s stored on the stack after PUSH BP
OldBP        dw   ?     ;caller’s BP
RetAddr      dw   ?     ;return address
Filler       dw   ?     ;character to fill the buffer with
Attrib       dw   ?     ;attribute to fill the buffer with
BufSize      dw   ?     ;number of character/attribute pairs to fill
BufOfs       dw   ?     ;buffer offset
BufSeg       dw   ?     ;buffer segment
EndMrk       db   ?     ;marker for the end of the stack frame
OnStack      ends
ClearS       proc near
     push    bp                         ;save caller’s BP
     mov     bp,sp                      ;point to stack frame
     cmp     word ptr [bp].BufSeg,0     ;skip the fill if a null
    jne      Start                      ; pointer is passed
    cmp      word ptr [bp].BufOfs,0
    je       Bye
Start: cld                           ;make STOSW count up
    mov      ax,[bp].Attrib          ;load AX with attribute parameter
    and      ax,0ff00h               ;prepare for merging with fill char
    mov      bx,[bp].Filler          ;load BX with fill char
    and      bx,0ffh                 ;prepare for merging with attribute
    or       ax,bx                   ;combine attribute and fill char
    mov      bx,[bp].BufOfs          ;load DI with target buffer offset
    mov      di,bx
    mov      bx,[bp].BufSeg          ;load ES with target buffer segment
    mov      es,bx
    mov      cx,[bp].BufSize         ;load CX with buffer size
    rep      stosw                   ;fill the buffer
Bye:mov      sp,bp                   ;restore original stack pointer
    pop      bp                      ; and caller’s BP
    ret      EndMrk-RetAddr-2        ;return, clearing the parms from the stack
ClearS       endp

The first thing you’ll notice about Listing 22.1 is that ClearS uses a REP STOSW instruction. That means that we’re not going to improve performance by any great amount, no matter how clever we are. While we can eliminate some cycles, the bulk of the work in ClearS is done by that one repeated string instruction, and there’s no way to improve on that.

Does that mean that Listing 22.1 is as good as it can be? Hardly. While the speed of ClearS is very good, there’s another side to the optimization equation: size. The whole of ClearS is 52 bytes long as it stands—but, as we’ll see, that size is hardly set in stone.

Where do we begin with ClearS? For starters, there’s an instruction in there that serves no earthly purpose—MOV SP,BP. SP is guaranteed to be equal to BP at that point anyway, so why reload it with the same value? Removing that instruction saves us two bytes.

Well, that was certainly easy enough! We’re not going to find any more totally non-functional instructions in ClearS, however, so let’s get on to some serious optimizing. We’ll look first for cases where we know of better instructions for particular tasks than those that were chosen. For example, there’s no need to load any register, whether segment or general, through BX; we can eliminate two instructions by loading ES and DI directly as shown in Listing 22.2.

Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash