Previous Table of Contents Next

Where Does the Time Go?

How slow is Listing 17.1? Table 17.1 shows that even on a 486, Listing 17.1 does fewer than three 96×96 generations per second. (The times in Table 17.1 are for 1,000 generations of a 96×96 cell map with seed=1, LIMIT_18_HZ=0, WRAP_EDGES=1, and magnifier=2, running on a 33 MHz 486.) Since my target is 18 generations per second with a 200×200 cellmap on a 20 MHz 386, Listing 17.1 is too slow by a rather wide margin—75 times too slow, in fact. You might say we have a little optimizing to do.

The first rule of optimization is: Only optimize where it matters. Use a profiler, or risk making a fool of yourself. Consider Listings 17.1 and 17.2. Where do you think the potential for significant speed-up lies? I’ll tell you one place where I thought there was considerable potential—in draw_pixel(). As a programmer of high-speed graphics, I figured any drawing function that was not only written in C/C++ but also recalculated the target address from scratch for each pixel would be among the first optimization targets. I also expected to get major gains out of going to a Ping-Pong arrangement so that I didn’t have to copy the new cellmap back to current_map after calculating the next generation.

Listing 17.1 Listing 17.3 Listing 17.4

Total execution time 340 secs 94 secs 45 secs
cell_state() 275 21
next_generation() 60 14 40
count_neighbors() 54
draw_pixel() 2 2 2
set_cell() <1 <1 <1
clear_cell() <1 <1 <1
copy_cells() <1 <1 <1

Table 17.1 Execution times for the game of life.

I was wrong. Wrong, wrong, wrong. (But at least I was smart enough to use a profiler before actually writing any new code.) Table 17.1 shows where the time actually goes in Listings 17.1 and 17.2. As you can see, the time taken by draw_pixel(), copy_cells(), and everything other than calculating the next generation is nothing more than noise. We could optimize these routines right down to executing instantaneously, and you know what? It wouldn’t make the slightest perceptible difference in how fast the program runs. Given the present state of our Game of Life implementation, the only areas worth looking at for possible optimizations are cell_state() and next_generation().

It’s worth noting, though, that one reason draw_pixel() doesn’t much affect performance is that in Listing 17.1, we’re smart enough to redraw pixels only when their states change, rather than during every generation. Detecting and eliminating redundant operations is part of knowing the nature of your data, and is a potent optimization technique that will be extremely useful a little later in this chapter.

The Hazards and Advantages of Abstraction

How can we speed up cell_state() and next_generation()? I’ll tell you how not to do it: By writing those member functions in assembly. It’s tempting to say that cell_state() is taking all the time, so we need to speed it up with assembly, but what we really need to do is figure out why cell_state() is taking all the time, then address that aspect of the program directly.

Once you know where you need to optimize, the one word to keep in mind isn’t assembly, it’s...plastics. No, actually, it’s abstraction. Well-written C and especially C++ programs are highly abstract models. For example, Listing 17.1 essentially creates a new programming language in which cells are tangible things, with built-in manipulation instructions. Given the cellmap member functions, you don’t even need to know the cell storage format! This is a wonderful thing, in general; it saves programming time and bugs, and frees you to work on the application’s needs, rather than implementation details.

However, if you never look beneath the surface of the abstract model at the implementation details, you have no idea of what the true performance cost of various operations is, and, without that, you have largely surrendered control over performance.

Having said that, let me hasten to add that algorithmic improvements can make a big difference even when working at a purely abstract level. For a large unordered data set, a high-level Quicksort will beat the pants off the best-implemented insertion sort you can imagine. Still, you can optimize your algorithm from here ’til doomsday, and if you have a fast algorithm running on top of a highly abstract programming model, you’ll almost certainly end up with a slow program. In Listing 17.1, the abstraction that’s killing us is that of looking at the eight neighbors with eight completely independent operations, requiring eight calls to cell_state() and eight calculations of cell address and cell mask. In fact, given the nature of cell storage, the eight neighbors are in a fixed relationship to one another, and the addresses and masks of all eight can generally be found very easily via hard-wired offsets and shifts once the address and mask of any one is known.

Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash