Previous | Table of Contents | Next |
LISTING 22.2 L22-2.ASM
ClearS proc near push bp ;save callers BP mov bp,sp ;point to stack frame cmp word ptr [bp].BufSeg,0 ;skip the fill if a null jne Start ; pointer is passed cmp word ptr [bp].BufOfs,0 je Bye Start: cld ;make STOSW count up mov ax,[bp].Attrib ;load AX with attribute parameter and ax,0ff00h ;prepare for merging with fill char mov bx,[bp].Filler ;load BX with fill char and bx,0ffh ;prepare for merging with attribute or ax,bx ;combine attribute and fill char mov di,[bp].BufOfs ;load DI with target buffer offset mov es,[bp].BufSeg ;load ES with target buffer segment mov cx,[bp].BufSize ;load CX with buffer size rep stosw ;fill the buffer Bye: pop bp ;restore callers BP ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack ClearS endp
(The OnStack structure definition doesnt change in any of our examples, so Im not going clutter up this chapter by reproducing it for each new version of ClearS.)
Okay, loading ES and DI directly saves another four bytes. Weve squeezed a total of 6 bytesabout 11 percentout of ClearS. What next?
Well, LES would serve better than two MOV instructions for loading ES and DI as shown in Listing 22.3.
LISTING 22.3 L22-3.ASM
ClearS proc near push bp ;save callers BP mov bp,sp ;point to stack frame cmp word ptr [bp].BufSeg,0 ;skip the fill if a null jne Start ; pointer is passed cmp word ptr [bp].BufOfs,0 je Bye Start: cld ;make STOSW count up mov ax,[bp].Attrib ;load AX with attribute parameter and ax,0ff00h ;prepare for merging with fill char mov bx,[bp].Filler ;load BX with fill char and bx,0ffh ;prepare for merging with attribute or ax,bx ;combine attribute and fill char les di,dword ptr [bp].BufOfs ;load ES:DI with target buffer ;segment:offset mov cx,[bp].BufSize ;load CX with buffer size rep stosw ;fill the buffer Bye: pop bp ;restore callers BP ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack ClearS endp
Thats good for another three bytes. Were down to 43 bytes, and counting.
We can save 3 more bytes by clearing the low and high bytes of AX and BX, respectively, by using SUB reg8,reg8 rather than ANDing 16-bit values as shown in Listing 22.4.
LISTING 22.4 L22-4.ASM
ClearS proc near push bp ;save callers BP mov bp,sp ;point to stack frame cmp word ptr [bp].BufSeg,0 ;skip the fill if a null jne Start ; pointer is passed cmp word ptr [bp].BufOfs,0 je Bye Start: cld ;make STOSW count up mov ax,[bp].Attrib ;load AX with attribute parameter sub al,al ;prepare for merging with fill char mov bx,[bp].Filler ;load BX with fill char sub bh,bh ;prepare for merging with attribute or ax,bx ;combine attribute and fill char les di,dword ptr [bp].BufOfs ;load ES:DI with target buffer ;segment:offset mov cx,[bp].BufSize ;load CX with buffer size rep stosw ;fill the buffer Bye: pop bp ;restore callers BP ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack ClearS endp
Now were down to 40 bytesmore than 20 percent smaller than the original code. Thats pretty much it for simple instruction optimizations. Now lets look for instruction optimizations.
It seems strange to load a word value into AX and then throw away AL. Likewise, it seems strange to load a word value into BX and then throw away BH. However, those steps are necessary because the two modified word values are ORed into a single character/attribute word value that is then used to fill the target buffer.
Lets step back and see what this code really does, though. All it does in the end is load one byte addressed relative to BP into AH and another byte addressed relative to BP into AL. Heck, we can just do that directly! Prestoweve saved another 6 bytes, and turned two word-sized memory accesses into byte-sized memory accesses as well. Listing 22.5 shows the new code.
Previous | Table of Contents | Next |