Previous | Table of Contents | Next |

Hints My Readers Gave Me

Back in high school, I took a pre-calculus class from Mr. Bourgeis, whose most notable characteristics were incessant pacing and truly enormous feet. My friend Barry, who sat in the back row, right behind me, claimed that it was because of his large feet that Mr. Bourgeis was so restless. Those feet were *so* heavy, Barry hypothesized, that if Mr. Bourgeis remained in any one place for too long, the floor would give way under the strain, plunging the unfortunate teacher deep into the mantle of the Earth and possibly all the way through to China. Many amusing cartoons were drawn to this effect.

Unfortunately, Barry was too busy drawing cartoons, or, alternatively, sleeping, to actually learn any math. In the long run, that didn’t turn out to be a handicap for Barry, who went on to become vice-president of sales for a ham-packing company, where presumably he was rarely called upon to derive the quadratic equation. Barry’s lack of scholarship caused some problems back then, though. On one memorable occasion, Barry was half-asleep, with his eyes open but unfocused and his chin balanced on his hand in the classic “if I fall asleep my head will fall off my hand and I’ll wake up” posture, when Mr. Bourgeis popped a killer problem:

“Barry, solve this for X, please.” On the blackboard lay the equation:

X - 1 = 0

“Minus 1,” Barry said promptly.

Mr. Bourgeis shook his head mournfully. “Try again.” Barry thought hard. He knew the fundamental rule that the answer to most mathematical questions is either 0, 1, infinity, -1, or minus infinity (do not apply this rule to balancing your checkbook, however); unfortunately, that gave him only a 25 percent chance of guessing right.

“One,” I whispered surreptitiously.

“Zero,” Barry announced. Mr. Bourgeis shook his head even more sadly.

“One,” I whispered louder. Barry looked still more thoughtful—a bad sign—so I whispered “one” again, even louder. Barry looked so thoughtful that his eyes nearly rolled up into his head, and I realized that he was just doing his best to convince Mr. Bourgeis that Barry had solved this one by himself.

As Barry neared the climax of his stirring performance and opened his mouth to speak, Mr. Bourgeis looked at him with great concern. “Barry, can you hear me all right?”

“Yes, sir,” Barry replied. “Why?”

“Well, I could hear the answer all the way up here. Surely you could hear it just one row away?”

The class went wild. They might as well have sent us home early for all we accomplished the rest of the day.

I like to think I know more about performance programming than Barry knew about math. Nonetheless, I always welcome good ideas and comments, and many readers have sent me a slew of those over the years. So in this chapter, I think I’ll return the favor by devoting a chapter to reader feedback.

Several people have pointed out that while **LEA** is great for performing certain additions (see Chapter 6), it isn’t a perfect replacement for **ADD**. What’s the difference? **LEA**, an addressing instruction by trade, doesn’t affect the flags, while the arithmetic **ADD** instruction most certainly does. This is no problem when performing additions that involve only quantities that fit in one machine word (32 bits in 386 protected mode, 16 bits otherwise), but it renders **LEA** useless for multiword operations, which use the Carry flag to tie together partial results. For example, these instructions

ADD EAX,EBX ADC EDX,ECX

could *not* be replaced

LEA EAX,[EAX+EBX] ADC EDX,ECX

because **LEA** doesn’t affect the Carry flag.

The no-carry characteristic of **LEA** becomes a distinct advantage when performing pointer arithmetic, however. For instance, the following code uses **LEA** to advance the pointers while adding one 128-bit memory variable to another such variable:

MOV ECX,4 ;# of 32-bit words to add CLC ;no carry into the initial ADC ADDLOOP: MOV EAX,[ESI] ;get the next element of one array ADC [EDI],EAX ;add it to the other array, with carry LEA ESI,[ESI+4] ;advance one array’s pointer LEA EDI,[EDI+4] ;advance the other array’s pointer LOOP ADDLOOP

(Yes, I could use **LODSD** instead of **MOV/LEA**; I’m just illustrating a point here. Besides, **LODS** is only 1 cycle faster than **MOV/LEA** on the 386, and is actually more than twice as slow on the 486.) If we used **ADD** rather than **LEA** to advance the pointers, the carry from one **ADC** to the next would have to be preserved with either **PUSHF/POPF** or **LAHF/SAHF**. (Alternatively, we could use multiple **INC**s, since **INC** doesn’t affect the Carry flag.)

In short, **LEA** is indeed different from **ADD**. Sometimes it’s better. Sometimes not; that’s the nature of the various instruction substitutions and optimizations that will occur to you over time. There’s no such thing as “best” instructions on the x86; it all depends on what you’re trying to do.

But there sure are a lot of interesting options, aren’t there?

Reader John Kennedy regularly passes along intriguing assembly programming tricks, many of which I’ve never seen mentioned anywhere else. John likes to optimize for size, whereas I lean more toward speed, but many of his optimizations are good for both purposes. Here are a few of my favorites:

John’s code for setting AX to its absolute value is:

CWD XOR AX,DX SUB AX,DX

This does nothing when bit 15 of AX is 0 (that is, if AX is positive). When AX is negative, the code “nots” it and adds 1, which is exactly how you perform a two’s complement negate. For the case where AX is not negative, this trick usually beats the stuffing out of the standard absolute value code:

AND AX,AX ;negative? JNS IsPositive ;no NEG AX ;yes,negate it IsPositive:

However, John’s code is slower on a 486; as you’re no doubt coming to realize (and as I’ll explain in Chapters 12 and 13), the 486 is an optimization world unto itself.

Here’s how John copies a block of bytes from DS:SI to ES:DI, moving as much data as possible a word at a time:

SHR CX,1 ;word count REP MOVSW ;copy as many words as possible ADC CX,CX ;CX=1 if copy length was odd, ;0 else REP MOVSB ;copy any odd byte

(**ADC CX,CX** can be replaced with **RCL CX,1**; which is faster depends on the processor type.) It might be hard to believe that the above is faster than this:

SHR CX,1 ;word count REP MOVSW ;copy as many words as ;possible JNC CopyDone ;done if even copy length MOVSB ;copy the odd byte CopyDone:

Previous | Table of Contents | Next |

Graphics Programming Black Book © 2001 Michael Abrash