|Previous||Table of Contents||Next|
And so we come to the end of our journey; for now, at least. What follows is a modest bit of optimization, one which originally served to show readers of Zen of Assembly Language that they had learned more than just bits and pieces of knowledge; that they had also begun to learn how to apply the flexible mindunconventional, broadly integrative thinkingto approaching high-level optimization at the algorithmic and program design levels. You, of course, need no such reassurance, having just spent 21 chapters learning about the flexible mind in many guises, but I think youll find this example instructive nonetheless. Try to stay ahead as the level of optimization rises from instruction elimination to instruction substitution to more creative solutions that involve broader understanding and redesign. Well start out by compacting individual instructions and bits of code, but by the end well come up with a solution that involves the very structure of the subroutine, with each instruction carefully integrated into a remarkably compact whole. Its a neat example of how optimization operates at many levels, some much less determininstic than othersand besides, its just plain fun.
In Jeff Duntemanns excellent book Borland Pascal From Square One (Random House, 1993), theres a small assembly subroutine thats designed to be called from a Turbo Pascal program in order to fill the screen or a systemscreen buffer with a specified character/attribute pair in text mode. This subroutine involves only 21 instructions and works perfectly well; however, with what we know, we can compact the subroutine tremendously and speed it up a bit as well. To coin a verb, we can Zen this already-tight assembly code to an astonishing degree. In the process, I hope youll get a feel for how advanced your assembly skills have become.
Jeffs original code follows as Listing 22.1 (with some text converted to lowercase in order to match the style of this book), but the comments are mine.
LISTING 22.1 L22-1.ASM
OnStack struc ;data thats stored on the stack after PUSH BP OldBP dw ? ;callers BP RetAddr dw ? ;return address Filler dw ? ;character to fill the buffer with Attrib dw ? ;attribute to fill the buffer with BufSize dw ? ;number of character/attribute pairs to fill BufOfs dw ? ;buffer offset BufSeg dw ? ;buffer segment EndMrk db ? ;marker for the end of the stack frame OnStack ends ; ClearS proc near push bp ;save callers BP mov bp,sp ;point to stack frame cmp word ptr [bp].BufSeg,0 ;skip the fill if a null jne Start ; pointer is passed cmp word ptr [bp].BufOfs,0 je Bye Start: cld ;make STOSW count up mov ax,[bp].Attrib ;load AX with attribute parameter and ax,0ff00h ;prepare for merging with fill char mov bx,[bp].Filler ;load BX with fill char and bx,0ffh ;prepare for merging with attribute or ax,bx ;combine attribute and fill char mov bx,[bp].BufOfs ;load DI with target buffer offset mov di,bx mov bx,[bp].BufSeg ;load ES with target buffer segment mov es,bx mov cx,[bp].BufSize ;load CX with buffer size rep stosw ;fill the buffer Bye:mov sp,bp ;restore original stack pointer pop bp ; and callers BP ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack ClearS endp
The first thing youll notice about Listing 22.1 is that ClearS uses a REP STOSW instruction. That means that were not going to improve performance by any great amount, no matter how clever we are. While we can eliminate some cycles, the bulk of the work in ClearS is done by that one repeated string instruction, and theres no way to improve on that.
Does that mean that Listing 22.1 is as good as it can be? Hardly. While the speed of ClearS is very good, theres another side to the optimization equation: size. The whole of ClearS is 52 bytes long as it standsbut, as well see, that size is hardly set in stone.
Where do we begin with ClearS? For starters, theres an instruction in there that serves no earthly purposeMOV SP,BP. SP is guaranteed to be equal to BP at that point anyway, so why reload it with the same value? Removing that instruction saves us two bytes.
Well, that was certainly easy enough! Were not going to find any more totally non-functional instructions in ClearS, however, so lets get on to some serious optimizing. Well look first for cases where we know of better instructions for particular tasks than those that were chosen. For example, theres no need to load any register, whether segment or general, through BX; we can eliminate two instructions by loading ES and DI directly as shown in Listing 22.2.
|Previous||Table of Contents||Next|