Michael Abrash's Graphics Programming Black Book Special Edition: The Best Optimizer Is between Your Ears

LISTING 1.7 L1-7.ASM

; Assembler subroutine to perform a 16-bit checksum on a block of
; bytes 1 to 64K in size. Adds checksum for block into passed-in
; checksum.
;
; Call as:
;     void ChecksumChunk(unsigned char *Buffer,
;     unsigned int BufferLength, unsigned int *Checksum);
;
; where:
;     Buffer = pointer to start of block of bytes to checksum
;     BufferLength = # of bytes to checksum (0 means 64K, not 0)
;     Checksum = pointer to unsigned int variable checksum is
;stored in
;
; Parameter structure:
;
Parms struc
                    dw    ?    ;pushed BP
                    dw    ?    ;return address
Buffer              dw    ?
BufferLength        dw    ?
Checksum            dw    ?
Parmsends
;
     .model small
     .code
     public _ChecksumChunk
_ChecksumChunkprocnear
     push  bp
     mov   bp,sp
     push  si                        ;save C’s register variable
;
     cld                             ;make LODSB increment SI
      mov  si,[bp+Buffer]            ;point to buffer
      mov  cx,[bp+BufferLength]      ;get buffer length
      mov  bx,[bp+Checksum]          ;point to checksum variable
      mov  dx,[bx]                   ;get the current checksum
      sub  ah,ah                     ;so AX will be a 16-bit value after LODSB
ChecksumLoop:
      lodsb                  ;get the next byte
      add  dx,ax             ;add it into the checksum total
      loop ChecksumLoop      ;continue for all bytes in block
      mov  [bx],dx           ;save the new checksum
;
      pop  si                ;restore C’s register variable
      pop  bp
      ret
_ChecksumChunkendp
      end

Note that in Table 1.1, optimization makes little difference except in the case of Listing 1.5, where the design has been refined considerably. Execution time in the other cases is dominated by time spent in DOS and/or the C library, so optimization of the code you write is pretty much irrelevant. What’s more, while the approximately two-times improvement we got by optimizing is not to be sneezed at, it pales against the up-to-50-times improvement we got by redesigning.

By the way, the execution times even of Listings 1.6 and 1.7 are dominated by DOS disk access times. If a disk cache is enabled and the file to be checksummed is already in the cache, the assembly version is three times as fast as the C version. In other words, the inherent nature of this application limits the performance improvement that can be obtained via assembly. In applications that are more CPU-intensive and less disk-bound, particularly those applications in which string instructions and/or unrolled loops can be used effectively, assembly tends to be considerably faster relative to C than it is in this very specific case.

Don’t get hung up on optimizing compilers or assembly language—the best optimizer is between your ears.

All this is basically a way of saying: Know where you’re going, know the territory, and know when it matters.

Where We’ve Been, What We’ve Seen

What have we learned? Don’t let other people’s code—even DOS—do the work for you when speed matters, at least not without knowing what that code does and how well it performs.

Optimization only matters after you’ve done your part on the program design end. Consider the ratios on the vertical axis of Table 1.1, which show that optimization is almost totally wasted in the checksumming application without an efficient design. Optimization is no panacea. Table 1.1 shows a two-times improvement from optimization—and a 50-times-plus improvement from redesign. The longstanding debate about which C compiler optimizes code best doesn’t matter quite so much in light of Table 1.1, does it? Your organic optimizer matters much more than your compiler’s optimizer, and there’s always assembly for those usually small sections of code where performance really matters.

Where We’re Going

This chapter has presented a quick step-by-step overview of the design process. I’m not claiming that this is the only way to create high-performance code; it’s just an approach that works for me. Create code however you want, but never forget that design matters more than detailed optimization. Never stop looking for inventive ways to boost performance—and never waste time speeding up code that doesn’t need to be sped up.

I’m going to focus on specific ways to create high-performance code from now on. In Chapter 5, we’ll continue to look at restartable blocks and internal buffering, in the form of a program that searches files for text strings.

Table of Contents