Michael Abrash's Graphics Programming Black Book Special Edition: The Best Optimizer Is between Your Ears

Make sure you understand what really goes on when you insert a seemingly-innocuous function call into the time-critical portions of your code.

In this case that means knowing how DOS and the C/C++ file-access libraries do their work. In other words, know the territory!

LISTING 1.4 L1-4.C

/*
* Program to calculate the 16-bit checksum of the stream of bytes
* from the specified file. Obtains the bytes one at a time via
* getc(), allowing C to perform data buffering.
*/
#include <stdio.h>

main(int argc, char *argv[]) {
      FILE *CheckFile;
      int Byte;
      unsigned int Checksum;

      if ( argc != 2 ) {
            printf(“usage: checksum filename\n”);
            exit(1);
      }
      if ( (CheckFile = fopen(argv[1], “rb”)) == NULL ) {
            printf(“Can’t open file: %s\n”, argv[1]);
            exit(1);
      }

      /* Initialize the checksum accumulator */
      Checksum = 0;

      /* Add each byte in turn into the checksum accumulator */
      while ( (Byte = getc(CheckFile)) != EOF ) {
            Checksum += (unsigned int) Byte;
      }

      /* Report the result */
      printf(“The checksum is: %u\n”, Checksum);
      exit(0);
}

Know When It Matters

The last section contained a particularly interesting phrase: the time-critical portions of your code. Time-critical portions of your code are those portions in which the speed of the code makes a significant difference in the overall performance of your program—and by “significant,” I don’t mean that it makes the code 100 percent faster, or 200 percent, or any particular amount at all, but rather that it makes the program more responsive and/or usable from the user’s perspective.

Don’t waste time optimizing non-time-critical code: set-up code, initialization code, and the like. Spend your time improving the performance of the code inside heavily-used loops and in the portions of your programs that directly affect response time. Notice, for example, that I haven’t bothered to implement a version of the checksum program entirely in assembly; Listings 1.2 and 1.6 call assembly subroutines that handle the time-critical operations, but C is still used for checking command-line parameters, operning files, printing, and the like.

If you were to implement any of the listings in this chapter entirely in hand-optimized assembly, I suppose you might get a performance improvement of a few percent—but I rather doubt you’d get even that much, and you’d sure as heck spend an awful lot of time for whatever meager improvement does result. Let C do what it does well, and use assembly only when it makes a perceptible difference.

Besides, we don’t want to optimize until the design is refined to our satisfaction, and that won’t be the case until we’ve thought about other approaches.

Always Consider the Alternatives

Listing 1.4 is good, but let’s see if there are other—perhaps less obvious—ways to get the same results faster. Let’s start by considering why Listing 1.4 is so much better than Listing 1.1. Like read(), getc() calls DOS to read from the file; the speed improvement of Listing 1.4 over Listing 1.1 occurs because getc() eads many bytes at once via DOS, then manages those bytes for us. That’s faster than reading them one at a time using read()—but there’s no reason to think that it’s faster than having our program read and manage blocks itself. Easier, yes, but not faster.

Consider this: Every invocation of getc() involves pushing a parameter, executing a call to the C library function, getting the parameter (in the C library code), looking up information about the desired stream, unbuffering the next byte from the stream, and returning to the calling code. That takes a considerable amount of time, especially by contrast with simply maintaining a pointer to a buffer and whizzing through the data in the buffer inside a single loop.

There are four reasons that many programmers would give for not trying to improve on Listing 1.4:

1. The code is already fast enough.

2. The code works, and some people are content with code that works, even when it’s slow enough to be annoying.

3. The C library is written in optimized assembly, and it’s likely to be faster than any code that the average programmer could write to perform essentially the same function.

4. The C library conveniently handles the buffering of file data, and it would be a nuisance to have to implement that capability.

I’ll ignore the first reason, both because performance is no longer an issue if the code is fast enough and because the current application does not run fast enough—13 seconds is a long time. (Stop and wait for 13 seconds while you’re doing something intense, and you’ll see just how long it is.)

The second reason is the hallmark of the mediocre programmer. Know when optimization matters—and then optimize when it does!

Table of Contents