|Previous||Table of Contents||Next|
The other day, my daughter suggested that we each draw the prettiest picture we could, then see whose was prettier. I wont comment on who won, except to note that apparently a bolt of lightning zipping toward a moose with antlers that bear an unfortunate resemblance to a propeller beanie isnt going to win me any scholarships to art school, if you catch my drift. Anyway, my drawing happened to feature the word chartreuse (because it rhymed with moose and Zeushence the lightning; more than that I am not at liberty to divulge), and she wanted to know if the moose was actually chartreuse. I had to admit that I didnt know, so we went to the dictionary, whereupon we learned that chartreuse is a pale apple-green color. Then she brought up the Windows Control Panel, pointed to the selection of predefined colors, and asked, Which of those is chartreuse?and I realized that I still didnt know.
Some things can be described perfectly with words, but others just have to be experienced. Color is one such category, and Pentium optimization is another. Ive spent the last two chapters detailing the rules for Pentium optimization, and Ill spend half of this one doing so, as well. Thats good; without understanding the fundamentals, we have no chance of optimizing well. Its not enough, though. We also need to look at a real-world example of Pentium optimization in action, and well do that later in this chapter; after which, you should go out and do some Pentium optimization on your own. Optimization is one of those things that you can learn a lot about from reading, but ultimately it has to sink into your pores as you do itespecially Pentium optimization because the Pentium is perhaps the most complex (and rewarding) chip to optimize for that Ive ever seen.
In the last chapter, we explored the dual-execution-pipe nature of the Pentium, and learned which instructions could pair (execute simultaneously) in which pipes. Now were ready to look at AGIs and register contentiontwo hazards that can prevent otherwise properly written code from taking full advantage of the Pentiums two pipes, and can thereby keep your code from pushing the Pentium to maximum performance.
The Pentium is advertised as having a five-stage pipeline for each of its execution units. All this means is that at any given time, up to five instructions are in various stages of execution in each pipe; this overlapping of execution is done for speed, so each instruction doesnt have to wait until the previous one has finished. The only way that the Pentiums pipelining directly affects the way you program is in the areas of AGIs and register dependencies.
AGIs are Address Generation Interlocks, a fancy way of saying that if a register is used to address memory, as is EBX in this instruction
and the value of the register is not set far enough ahead for the Pentium to perform the addressing calculations before the instruction needs the address, then the Pentium will stall the pipe in which the instruction is executing until the value becomes available and the addressing calculations have been performed. Remember, also, that instructions execute in lockstep on the Pentium, so if one pipe stalls for a cycle, making its instruction take one cycle longer, that extends by one cycle the time until the other pipe can begin its next instruction, as well.
The rule for AGIs is simple: If you modify any part of a register during a cycle, you cannot use that register to address memory during either that cycle or the next cycle. If you try to do this, the Pentium will simply stall the instruction that tries to use that register to address memory until two cycles after the register was modified. This was true on the 486 as well, but the Pentiums new twist is that since more than one instruction can execute in a single cycle, an AGI can stall an instruction thats as many as three instructions away from the changing of the addressing register, as shown in Figure 21.1, and an AGI can also cause a stall that costs as many as three instructions, as shown in Figure 21.2. This means that AGIs are both much easier to cause and potentially more expensive than on the 486, and you must keep a sharp eye out for them. It also means that its often worth calculating a memory pointer several instructions ahead of its actual use. Unfortunately, this tends to extend the lifetimes of pointer registers to span a greater number of instructions, making the Pentiums relatively small register set seem even smaller.
Figure 21.1 An AGI can stall up to three instructions later.
As an example of a sort of AGI thats new to the Pentium, consider the following test for a NULL pointer, followed by the use of the pointer if its not NULL:
push ebx ;U-pipe cycle 1 mov ebx,[Ptr] ;V-pipe cycle 1 and ebx,ebx ;U-pipe cycle 2 jz short IsNull ;V-pipe cycle 2 mov eax,[ebx] ;U-pipe cycle 3 AGI stall mov edx,[ebp-8] ;V-pipe cycle 3 lockstep idle ;U-pipe cycle 4 mov eax,[ebx] ;V-pipe cycle 4 mov edx,[ebp-8]
This commonplace code loses a U-pipe cycle to the AGI caused by AND EBX,EBX, followed by the attempt two instructions later to use EBX to point to memory. The code loses a V-pipe cycle as well, because lockstep execution wont let the next V-pipe instruction execute until the paired U-pipe instruction that suffered the AGI finishes. The solution is to use TEST EBX,EBX instead of AND; TEST cant modify EBX, so no AGI occurs. Sure, AND EBX,EBX doesnt modify EBX either, but the Pentium doesnt know that, so it has to insert the AGI.
Figure 21.2 An AGI can cost as many as 3 cycles.
|Previous||Table of Contents||Next|