Michael Abrash's Graphics Programming Black Book Special Edition: Crossing the Border

Interpreting Where the Cycles Go

To boost the overall performance of Listing 5.1, I would normally convert SearchForString() to assembly language at this point. However, I’m not going to do that, and the reason is as important a lesson as any discussion of optimized assembly code is likely to be. Take a moment to examine some interesting performance aspects of the C implementation, and all should become much clearer.

As you’ll recall from Chapter 1, one of the important rules for optimization involves knowing when optimization is worth bothering with at all. Another rule involves understanding where most of a program’s execution time is going. That’s more true for Listing 5.1 than you might think.

When Listing 5.1 is run on a 1 MB assembly source file, it takes about three seconds to find the string “xxxend” (which is at the end of the file) on a 20 MHz 386 machine, with the entire file in a disk cache. If BLOCK_SIZE is trimmed from 16K to 4K, execution time does not increase perceptibly! At 2K, the program slows slightly; it’s not until the block size shrinks to 64 bytes that execution time becomes approximately double that of the 16K buffer.

So the first thing we’ve discovered is that, while bigger blocks do make for the best performance, the increment in performance may not be very large, and might not justify the extra memory required for those larger blocks. Our next discovery is that, even though we read the file in large chunks, most of the execution time of Listing 5.1 is nonetheless spent in executing the read() function.

When I replaced the read() function call in Listing 5.1 with code that simply fools the program into thinking that a 1 MB file is being read, the program ran almost instantaneously—in less than 1/2 second, even when the searched-for string wasn’t anywhere to be found. By contrast, Listing 5.1 requires three seconds to run even when searching for a single character that isn’t found anywhere in the file, the case in which a single call to memchr() (and thus a single REPNZ SCASB) can eliminate an entire block at a time.

All in all, the time required for DOS disk access calls is taking up at least 80 percent of execution time, and search time is less than 20 percent of overall execution time. In fact, search time is probably a good deal less than 20 percent of the total, given that the overhead of loading the program, running through the C startup code, opening the file, executing printf(), and exiting the program and returning to the DOS shell are also included in my timings. Given which, it should be apparent why converting to assembly language isn’t worth the trouble—the best we could do by speeding up the search is a 10 percent or so improvement, and that would require more than doubling the performance of code that already uses repeated string instructions to do most of the work.

Not likely.

Knowing When Assembly Is Pointless

So that’s why we’re not going to go to assembly language in this example—which is not to say it would never be worth converting the search engine in Listing 5.1 to assembly.

If, for example, your application will typically search buffers in which the first character of the search string occurs frequently as might be the case when searching a text buffer for a string starting with the space character an assembly implementation might be several times faster. Why? Because assembly code can switch from REPNZ SCASB to match the first character to REPZ CMPS to check the remaining characters in just a few instructions.

In contrast, Listing 5.1 must return from memchr(), set up parameters, and call memcmp() in order to do the same thing. Likewise, assembly can switch back to REPNZ SCASB after a non-match much more quickly than Listing 5.1. The switching overhead is high; when searching a file completely filled with the character z for the string “zy,” Listing 5.1 takes almost 1/2 minute, or nearly an order of magnitude longer than when searching a file filled with normal text.

Table of Contents