Michael Abrash's Graphics Programming Black Book Special Edition: Hints My Readers Gave Me

LISTING 9.5 L9-5.ASM

; Divides an arbitrarily long unsigned dividend by a 16-bit unsigned
; divisor. C near-callable as:
;     unsigned int Div(unsigned int * Dividend,
;     int DividendLength, unsigned int Divisor,
;     unsigned int * Quotient);
;
; Returns the remainder of the division.
;
; Tested with TASM 2.

parms struc
          dw     2 dup (?)     ;pushed BP & return address
Dividend  dw     ?             ;pointer to value to divide, stored in Intel
                               ; order, with lsb at lowest address, msb at
                               ; highest. Must be composed of an integral
                               ; number of words
DividendLength   dw  ?         ;# of bytes in Dividend. Must be a multiple
                               ; of 2
Divisor          dw ?          ;value by which to divide. Must not be zero,
                               ; or a Divide By Zero interrupt will occur
Quotient         dw ?          ;pointer to buffer in which to store the
                               ; result of the division, in Intel order.
                               ; The quotient returned is of the same
                               ; length as the dividend
parmsends

               .model     small
               .code
               public     _Div
_Divprocnear
               push    bp      ;preserve caller’s stack frame
               mov     bp,sp   ;point to our stack frame
               push    si      ;preserve caller’s register variables
               push    di

               std             ;we’re working from msb to lsb
               mov  ax,ds
               mov  es,ax      ;for STOS
               mov  cx,[bp+DividendLength]
               sub  cx,2
               mov  si,[bp+Dividend]
               add  si,cx      ;point to the last word of the dividend
                               ; (the most significant word)
               mov  di,[bp+Quotient]
               add  di,cx      ;point to the last word of the quotient
                               ; buffer (the most significant word)
               mov  bx,[bp+Divisor]
               shr  cx,1
               inc  cx         ;# of words to process
               sub  dx,dx      ;convert initial divisor word to a 32-bit
                               ;value for DIV
DivLoop:
               lod  sw         ;get next most significant word of divisor
               div  bx
               sto  sw         ;save this word of the quotient
                               ;DX contains the remainder at this point,
                               ; ready to prepend to the next divisor word
               loop  DivLoop
               mov   ax,dx     ;return the remainder
               cld             ;restore default Direction flag setting
               pop   di        ;restore caller’s register variables
               pop   si
               pop   bp        ;restore caller’s stack frame
               ret
_Divendp
               end

LISTING 9.6 L9-6.C

/* Sample use of Div function to perform division when the result
   doesn’t fit in 16 bits */

#include <stdio.h>

extern unsigned int Div(unsigned int * Dividend,
          int DividendLength, unsigned int Divisor,
          unsigned int * Quotient);

main() {
   unsigned long m, i = 0x20000001;
   unsigned int k, j = 0x10;

   k = Div((unsigned int *)&i, sizeof(i), j, (unsigned int *)&m);
   printf(“%lu / %u = %lu r %u\n”, i, j, m, k);
}

Sweet Spot Revisited

Way back in Volume 1, Number 1 of PC TECHNIQUES, (April/May 1990) I wrote the very first of that magazine’s HAX (#1), which extolled the virtues of placing your most commonly-used automatic (stack-based) variables within the stack’s “sweet spot,” the area between +127 to -128 bytes away from BP, the stack frame pointer. The reason was that the 8088 can store addressing displacements that fall within that range in a single byte; larger displacements require a full word of storage, increasing code size by a byte per instruction, and thereby slowing down performance due to increased instruction fetching time.

This takes on new prominence in 386 native mode, where straying from the sweet spot costs not one, but two or three bytes. Where the 8088 had two possible displacement sizes, either byte or word, on the 386 there are three possible sizes: byte, word, or dword. In native mode (32-bit protected mode), however, a prefix byte is needed in order to use a word-sized displacement, so a variable located outside the sweet spot requires either two extra bytes (an extra displacement byte plus a prefix byte) or three extra bytes (a dword displacement rather than a byte displacement). Either way, instructions grow alarmingly.

Performance may or may not suffer from missing the sweet spot, depending on the processor, the memory architecture, and the code mix. On a 486, prefix bytes often cost a cycle; on a 386SX, increased code size often slows performance because instructions must be fetched through the half-pint 16-bit bus; on a 386, the effect depends on the instruction mix and whether there’s a cache.

On balance, though, it’s as important to keep your most-used variables in the stack’s sweet spot in 386 native mode as it was on the 8088.

In assembly, it’s easy to control the organization of your stack frame. In C, however, you’ll have to figure out the allocation scheme your compiler uses to allocate automatic variables, and declare automatics appropriately to produce the desired effect. It can be done: I did it in Turbo C some years back, and trimmed the size of a program (admittedly, a large one) by several K—not bad, when you consider that the “sweet spot” optimization is essentially free, with no code reorganization, change in logic, or heavy thinking involved.

Table of Contents