|Previous||Table of Contents||Next|
The basic idea is to maintain a change list. This is an array of pointers into the cell array. Each change list element points to a word which changes in the next generation. This way we dont have to waste time scanning every cell since most of them do not change. Two passes are made through the change list. The first pass updates the cell display on the screen, sets the life/death status of each cell for this new generation, and updates the neighbor counts for the adjacent cells. There are some efficiencies gained by using cell triplets rather than individual cells since we usually dont need to set all eight neighbors. [Again, the neighbor counts for cells in the same word are implied by the states of those cells.] The second pass sets the next-generation states for the cells and their neighbors, and in the process builds the change list for the next generation.
Processing each word is a little complex but very fast. A 64K block of code exists with routines on each 256-byte boundary. Generally speaking, the entry point corresponds to the high byte of the cell word. This byte contains the life/death values and a bit to indicate if this is an edge condition. During the first pass we take the cell triplet word, AND it with 0XFE00, and jump to that address. During the second pass we take the cell triplet word, AND it with 0xFE00, OR it with 0x0100, and jump to that address. [Therefore, there are 128 possible jump targets on the first pass, and 128 more on the second, all on 256-byte boundaries and all keyed off the high 7 bits of the cell triplet state; because bit 8 of the jump index is 0 on the first pass and 1 on the second, there is no conflict. The lower bit isnt needed for other purposes because only the edge flag bit and the six life/death state bits matter for jumping into Davids state machine. The other nine bits, the bits used for the neighbor counts, are used only in the next step.]
Determining which changes must be made to a cell triplet is easy and surprisingly quick. Theres no counting! Instead, I use a 64K lookup table indexed by the cell triplet itself. The value of the lookup table entry is equal to what the high byte should be in the next generation. If this value is equal to the current high byte, then no changes are necessary to the cell. Otherwise it is placed in the change list. Look at the code in the Test() and Fix() functions to see how this is done. [This step is as important as it is obscure. David has a 64K table organized so that if you use a word describing a cell triplet as a lookup index, the byte you will read will be the state of the high byte for the next generation. In other words, Davids table is constructed so that the edge flag bit, the life/death states, and the three neighbor count fields form an index to a byte describing the next generation state for that triplet. In practice, only the next generation field of the cell changes. Then, if another change to a nearby cell tries to nudge that cell into changing again, Davids code sees that the desired state is already set, and does not add that cell to the change list again.]
Segment usage in Davids assembly code is summarized in Listing 18.6.
LISTING 18.6 QLIFE Assembly Segment Usage
CS : 64K code (table of routines on 256 byte boundaries) DS : DGROUP (1st pass) / 64K cell life/death classification table (second pass) ES : Change list SS : DGROUP; the life cell grid and row/column table FS : Video segment GS : Unused
Most likely, youre scratching your head right now in bemusement. I dont blame you; I felt the same way myself at first. Its actually pretty simple, though, once you have the hang of it. Basically, David runs down the change list, visiting every cell thats due to change in this generation, setting it to the new state, drawing it in the new state, and adjusting the counts of all its neighbors. David has a separate assembly routine for every possible change of state for a cell triplet, and he jumps to the proper routine by taking the cell triplet word, masking off the lower 9 bits, and jumping to the address where the appropriate code to perform that particular change of state resides. He does this for every entry in the change list. When this is completed, the current generation has been drawn and updated.
Now David runs down the change list again to generate the change list for the next generation. In this case, for every changed cell triplet, David looks at that triplet and all affected neighbors to see which will change in the next generation. He tests for this condition by using each potentially changed cell triplet word as an index into the aforementioned lookup table of new states. If the current state matches the appropriate state for the next generation, then theres nothing to do and the cell is not added to the change list. If the states dont match, then the cell is added to the change list, and the appropriate state for the next generation is set in the cell triplet. David checks the minimum possible number of cells for change by branching to code that checks only the relevant cells around each cell triplet in the current change list; that branching is accomplished by taking the cell triplet word, masking off the lower 9 bits, setting bit 8 to a 1-bit, and branching to the routine at that address. As with everything in this amazing program, this represents the least possible work to accomplish the desired resultjust three instructions:
mov dh,[bp+1] or dh,1 jmp dx
These suffice to select the proper, minimum-work code to process the next cell triplet that has changed, and all potentially affected neighbors. For all the size of Davids code, it has an astonishing economy of effort, as execution glides through the change list without a wasted instruction.
Alas, I dont have the room to discuss Peter Klerings equally remarkable Life implementation here. Ill close this chapter with a quote from Terje Mathisen, one of the finest optimizers it has ever been my pleasure to meet, who, after looking over Davids and Peters entries, said, This has been an eye-opening experience for me. I honestly thought I had the fastest possible approach. TANSTATFC.
There Aint No Such Thing As the Fastest Code.
|Previous||Table of Contents||Next|