|Previous||Table of Contents||Next|
LISTING 1.7 L1-7.ASM
; Assembler subroutine to perform a 16-bit checksum on a block of ; bytes 1 to 64K in size. Adds checksum for block into passed-in ; checksum. ; ; Call as: ; void ChecksumChunk(unsigned char *Buffer, ; unsigned int BufferLength, unsigned int *Checksum); ; ; where: ; Buffer = pointer to start of block of bytes to checksum ; BufferLength = # of bytes to checksum (0 means 64K, not 0) ; Checksum = pointer to unsigned int variable checksum is ;stored in ; ; Parameter structure: ; Parms struc dw ? ;pushed BP dw ? ;return address Buffer dw ? BufferLength dw ? Checksum dw ? Parmsends ; .model small .code public _ChecksumChunk _ChecksumChunkprocnear push bp mov bp,sp push si ;save Cs register variable ; cld ;make LODSB increment SI mov si,[bp+Buffer] ;point to buffer mov cx,[bp+BufferLength] ;get buffer length mov bx,[bp+Checksum] ;point to checksum variable mov dx,[bx] ;get the current checksum sub ah,ah ;so AX will be a 16-bit value after LODSB ChecksumLoop: lodsb ;get the next byte add dx,ax ;add it into the checksum total loop ChecksumLoop ;continue for all bytes in block mov [bx],dx ;save the new checksum ; pop si ;restore Cs register variable pop bp ret _ChecksumChunkendp end
Note that in Table 1.1, optimization makes little difference except in the case of Listing 1.5, where the design has been refined considerably. Execution time in the other cases is dominated by time spent in DOS and/or the C library, so optimization of the code you write is pretty much irrelevant. Whats more, while the approximately two-times improvement we got by optimizing is not to be sneezed at, it pales against the up-to-50-times improvement we got by redesigning.
By the way, the execution times even of Listings 1.6 and 1.7 are dominated by DOS disk access times. If a disk cache is enabled and the file to be checksummed is already in the cache, the assembly version is three times as fast as the C version. In other words, the inherent nature of this application limits the performance improvement that can be obtained via assembly. In applications that are more CPU-intensive and less disk-bound, particularly those applications in which string instructions and/or unrolled loops can be used effectively, assembly tends to be considerably faster relative to C than it is in this very specific case.
|Dont get hung up on optimizing compilers or assembly languagethe best optimizer is between your ears.|
All this is basically a way of saying: Know where youre going, know the territory, and know when it matters.
What have we learned? Dont let other peoples codeeven DOSdo the work for you when speed matters, at least not without knowing what that code does and how well it performs.
Optimization only matters after youve done your part on the program design end. Consider the ratios on the vertical axis of Table 1.1, which show that optimization is almost totally wasted in the checksumming application without an efficient design. Optimization is no panacea. Table 1.1 shows a two-times improvement from optimizationand a 50-times-plus improvement from redesign. The longstanding debate about which C compiler optimizes code best doesnt matter quite so much in light of Table 1.1, does it? Your organic optimizer matters much more than your compilers optimizer, and theres always assembly for those usually small sections of code where performance really matters.
This chapter has presented a quick step-by-step overview of the design process. Im not claiming that this is the only way to create high-performance code; its just an approach that works for me. Create code however you want, but never forget that design matters more than detailed optimization. Never stop looking for inventive ways to boost performanceand never waste time speeding up code that doesnt need to be sped up.
Im going to focus on specific ways to create high-performance code from now on. In Chapter 5, well continue to look at restartable blocks and internal buffering, in the form of a program that searches files for text strings.
|Previous||Table of Contents||Next|