Previous Table of Contents Next

The large model is actually not necessary for the 96×96 cellmap in Listing 17.5. However, I was actually more interested in seeing a fast 200×200 cellmap, and two 200×200 cellmaps can’t fit in a single segment. (This can easily be worked around in assembly language for cellmaps up to a segment in size; beyond that size, cellmap scanning becomes pretty complex, although it can still be efficiently implemented with some clever programming.)

Anyway, using the large model helps illustrate that it’s the data representation and the data processing approach you choose that matter most. Optimization details like memory models and segments and in-line functions and assembly language are important but secondary. Let your mind roam creatively before you start coding. Otherwise, you may find you’re writing well-tuned slow code, which is by no means the same thing as fast code.

Take a close look at Listing 17.5. You will see that it’s quite a bit simpler than Listing 17.4. To some extent, that’s because I decided to hard-wire the program to wrap around from one edge of the cellmap to the other (it’s much more interesting that way), but the main reason is that it’s a lot easier to work with the neighbor-count model. There’s no complex mask and pointer management, and the only thing that really needs to be optimized is scanning for zero bytes. (And, in fact, I haven’t optimized even that because it’s done in a C++ loop; it should really be REPZ SCASB.)

In truth, none of the code in Listing 17.5 is particularly well-optimized, and, as I noted, the program must be compiled with the large model for large cellmaps. Also, of course, the entire program is still in C++; note well that there’s not a whit of assembly here.

We’ve gotten more than a 30-times speedup simply by removing a little of the abstraction that C++ encourages, and by storing and processing the data in a manner appropriate for the typical nature of the data itself. In other words, we’ve done some linear, left-brained optimization (using pointers and reducing calls) and some non-linear, right-brained optimization (understanding the real problem and listening for the creative whisper of non-obvious solutions).

No doubt we could get another two to five times improvement with good assembly code—but that’s dwarfed by a 30-times improvement, so optimization at a conceptual level must come first.

The Challenge That Ate My Life

The most recent optimization challenge I laid my community of readers was to write the fastest possible Game of Life generation engine. By “engine” I meant that I didn’t care about time spent in input or output, only time consumed by the call to next-generation. The time spent updating the cellmap was what I wanted people to concentrate on.

Here are the rules I laid down for the challenge:

  Readers could modify any code in Listing 17.5, except the main loop, as well as change the cell map representation any way they liked. However, the code had to produce exactly the same output as Listing 17.5 under all circumstances in order to be eligible to win.
  Engine code had to be less than 400 lines long in total, excluding the video-related code shown in Listing 17.2.
  Submissions had to compile/assemble with Borland C++ (in either C++ or C mode, as desired) and/or TASM.
  All submissions had to handle cellmaps at least 200×200 in size.
  Assembly language could of course be used to speed up any part of the program. C rather than C++ was legal as well, so long as entered implementations produced the same results as Listing 17.5 and 17.2 together and were less than 400 lines long.
  All entries would be timed on the same 33 MHz 486 with a 256K external cache.

That was the challenge I put to the readers. Little did I realize the challenge it would lay on me: Entries poured in from the four corners of the globe. Some were plain, some were brilliant, some were, well, berserk. Many didn’t even work. But all had to be gone through, examined for adherence to the rules, read, compiled, linked, run, and judged. I learned a lot—about a lot of things, not the least of which was the process (or maybe the wisdom) of laying down challenges to readers.

Who won? What did I learn? To find out, read on.

Previous Table of Contents Next

Graphics Programming Black Book © 2001 Michael Abrash