| Previous | Table of Contents | Next |
LISTING 22.5 L22-5.ASM
ClearS proc near
push bp ;save callers BP
mov bp,sp ;point to stack frame
cmp word ptr [bp].BufSeg,0 ;skip the fill if a null
jne Start ; pointer is passed
cmp word ptr [bp].BufOfs,0
je Bye
Start: cld ;make STOSW count up
mov ah,byte ptr [bp].Attrib[1];load AH with attribute
mov al,byte ptr [bp].Filler ;load AL with fill char
les di,dword ptr [bp].BufOfs ;load ES:DI with target buffer segment:offset
mov cx,[bp].BufSize ;load CX with buffer size
rep stosw ;fill the buffer
Bye:
pop bp ;restore callers BP
ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack
ClearS endp
(We could get rid of yet another instruction by having the calling code pack both the attribute and the fill value into the same word, but thats not part of the specification for this particular routine.)
Another nifty instruction-rearrangement trick saves 6 more bytes. ClearS checks to see whether the far pointer is null (zero) at the start of the routine...then loads and uses that same far pointer later on. Lets get that pointer into registers and keep it there; that way we can check to see whether its null with a single comparison, and can use it later without having to reload it from memory. This technique is shown in Listing 22.6.
LISTING 22.6 L22-6.ASM
ClearS proc near
push bp ;save callers BP
mov bp,sp ;point to stack frame
les di,dword ptr [bp].BufOfs ;load ES:DI with target buffer;segment:offset
mov ax,es ;put segment where we can test it
or ax,di ;is it a null pointer?
je Bye ;yes, so were done
Start: cld ;make STOSW count up
mov ah,byte ptr [bp].Attrib[1];load AH with attribute
mov al,byte ptr [bp].Filler ;load AL with fill char
mov cx,[bp].BufSize ;load CX with buffer size
rep stosw ;fill the buffer
Bye:
pop bp ;restore callers BP
ret EndMrk-RetAddr-2 ;return, clearing the parms from the stack
ClearS endp
Well. Now were down to 28 bytes, having reduced the size of this subroutine by nearly 50 percent. Only 13 instructions remain. Realistically, how much smaller can we make this code?
About one-third smaller yet, as it turns outbut in order to do that, we must stretch our minds and use the 8088s instructions in unusual ways. Let me ask you this: What do most of the instructions in the current version of ClearS do?
They either load parameters from the stack frame or set up the registers so that the parameters can be accessed. Mind you, theres nothing wrong with the stack-frame-oriented instructions used in ClearS; those instructions access the stack frame in a highly efficient way, exactly as the designers of the 8088 intended, and just as the code generated by a high-level language would. That means that we arent going to be able to improve the code if we dont bend the rules a bit.
Lets think...the parameters are sitting on the stack, and most of our instruction bytes are being used to read bytes off the stack with BP-based addressing...we need a more efficient way to address the stack...the stack...THE STACK!
Ye gods! Thats easywe can use the stack pointer to address the stack rather than BP. While its true that the stack pointer cant be used for mod-reg-rm addressing, as BP can, it can be used to pop data off the stackand POP is a one-byte instruction. Instructions dont get any shorter than that.
There is one detail to be taken care of before we can put our plan into action: The return addressthe address of the calling codeis on top of the stack, so the parameters we want cant be reached with POP. Thats easily solved, howeverwell just pop the return address into an unused register, then branch through that register when were done, as we learned to do in Chapter 14. As we pop the parameters, well also be removing them from the stack, thereby neatly avoiding the need to discard them when its time to return.
With that problem dealt with, Listing 22.7 shows the Zenned version of ClearS.
LISTING 22.7 L22-7.ASM
ClearS procnear
pop dx ;get the return address
pop ax ;put fill char into AL
pop bx ;get the attribute
mov ah,bh ;put attribute into AH
pop cx ;get the buffer size
pop di ;get the offset of the buffer origin
pop es ;get the segment of the buffer origin
mov bx,es ;put the segment where we can test it
or bx,di ;null pointer?
je Bye ;yes, so were done
cld ;make STOSW count up
rep stosw ;do the string store
Bye:
jmp dx ;return to the calling code
ClearS endp
At long last, were down to the bare metal. This version of ClearS is just 19 bytes long. Thats just 37 percent as long as the original version, without any change whatsoever in the functionality that ClearS makes available to the calling code. The code is bound to run a bit faster too, given that there are far fewer instruction bytes and fewer memory accesses.
All in all, the Zenned version of ClearS is a vast improvement over the original. Probably not the best possible implementationnever say never!but an awfully good one.
| Previous | Table of Contents | Next |