| Previous | Table of Contents | Next |
LISTING 22.2 L22-2.ASM
ClearS proc near
push bp ;save callers BP
mov bp,sp ;point to stack frame
cmp word ptr [bp].BufSeg,0 ;skip the fill if a null
jne Start ; pointer is passed
cmp word ptr [bp].BufOfs,0
je Bye
Start: cld ;make STOSW count up
mov ax,[bp].Attrib ;load AX with attribute parameter
and ax,0ff00h ;prepare for merging with fill char
mov bx,[bp].Filler ;load BX with fill char
and bx,0ffh ;prepare for merging with attribute
or ax,bx ;combine attribute and fill char
mov di,[bp].BufOfs ;load DI with target buffer offset
mov es,[bp].BufSeg ;load ES with target buffer segment
mov cx,[bp].BufSize ;load CX with buffer size
rep stosw ;fill the buffer
Bye:
pop bp ;restore callers BP
ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack
ClearS endp
(The OnStack structure definition doesnt change in any of our examples, so Im not going clutter up this chapter by reproducing it for each new version of ClearS.)
Okay, loading ES and DI directly saves another four bytes. Weve squeezed a total of 6 bytesabout 11 percentout of ClearS. What next?
Well, LES would serve better than two MOV instructions for loading ES and DI as shown in Listing 22.3.
LISTING 22.3 L22-3.ASM
ClearS proc near
push bp ;save callers BP
mov bp,sp ;point to stack frame
cmp word ptr [bp].BufSeg,0 ;skip the fill if a null
jne Start ; pointer is passed
cmp word ptr [bp].BufOfs,0
je Bye
Start: cld ;make STOSW count up
mov ax,[bp].Attrib ;load AX with attribute parameter
and ax,0ff00h ;prepare for merging with fill char
mov bx,[bp].Filler ;load BX with fill char
and bx,0ffh ;prepare for merging with attribute
or ax,bx ;combine attribute and fill char
les di,dword ptr [bp].BufOfs ;load ES:DI with target buffer
;segment:offset
mov cx,[bp].BufSize ;load CX with buffer size
rep stosw ;fill the buffer
Bye:
pop bp ;restore callers BP
ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack
ClearS endp
Thats good for another three bytes. Were down to 43 bytes, and counting.
We can save 3 more bytes by clearing the low and high bytes of AX and BX, respectively, by using SUB reg8,reg8 rather than ANDing 16-bit values as shown in Listing 22.4.
LISTING 22.4 L22-4.ASM
ClearS proc near
push bp ;save callers BP
mov bp,sp ;point to stack frame
cmp word ptr [bp].BufSeg,0 ;skip the fill if a null
jne Start ; pointer is passed
cmp word ptr [bp].BufOfs,0
je Bye
Start: cld ;make STOSW count up
mov ax,[bp].Attrib ;load AX with attribute parameter
sub al,al ;prepare for merging with fill char
mov bx,[bp].Filler ;load BX with fill char
sub bh,bh ;prepare for merging with attribute
or ax,bx ;combine attribute and fill char
les di,dword ptr [bp].BufOfs ;load ES:DI with target buffer
;segment:offset
mov cx,[bp].BufSize ;load CX with buffer size
rep stosw ;fill the buffer
Bye:
pop bp ;restore callers BP
ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack
ClearS endp
Now were down to 40 bytesmore than 20 percent smaller than the original code. Thats pretty much it for simple instruction optimizations. Now lets look for instruction optimizations.
It seems strange to load a word value into AX and then throw away AL. Likewise, it seems strange to load a word value into BX and then throw away BH. However, those steps are necessary because the two modified word values are ORed into a single character/attribute word value that is then used to fill the target buffer.
Lets step back and see what this code really does, though. All it does in the end is load one byte addressed relative to BP into AH and another byte addressed relative to BP into AL. Heck, we can just do that directly! Prestoweve saved another 6 bytes, and turned two word-sized memory accesses into byte-sized memory accesses as well. Listing 22.5 shows the new code.
| Previous | Table of Contents | Next |