Previous | Table of Contents | Next |
Back in high school, I took a pre-calculus class from Mr. Bourgeis, whose most notable characteristics were incessant pacing and truly enormous feet. My friend Barry, who sat in the back row, right behind me, claimed that it was because of his large feet that Mr. Bourgeis was so restless. Those feet were so heavy, Barry hypothesized, that if Mr. Bourgeis remained in any one place for too long, the floor would give way under the strain, plunging the unfortunate teacher deep into the mantle of the Earth and possibly all the way through to China. Many amusing cartoons were drawn to this effect.
Unfortunately, Barry was too busy drawing cartoons, or, alternatively, sleeping, to actually learn any math. In the long run, that didnt turn out to be a handicap for Barry, who went on to become vice-president of sales for a ham-packing company, where presumably he was rarely called upon to derive the quadratic equation. Barrys lack of scholarship caused some problems back then, though. On one memorable occasion, Barry was half-asleep, with his eyes open but unfocused and his chin balanced on his hand in the classic if I fall asleep my head will fall off my hand and Ill wake up posture, when Mr. Bourgeis popped a killer problem:
Barry, solve this for X, please. On the blackboard lay the equation:
X - 1 = 0
Minus 1, Barry said promptly.
Mr. Bourgeis shook his head mournfully. Try again. Barry thought hard. He knew the fundamental rule that the answer to most mathematical questions is either 0, 1, infinity, -1, or minus infinity (do not apply this rule to balancing your checkbook, however); unfortunately, that gave him only a 25 percent chance of guessing right.
One, I whispered surreptitiously.
Zero, Barry announced. Mr. Bourgeis shook his head even more sadly.
One, I whispered louder. Barry looked still more thoughtfula bad signso I whispered one again, even louder. Barry looked so thoughtful that his eyes nearly rolled up into his head, and I realized that he was just doing his best to convince Mr. Bourgeis that Barry had solved this one by himself.
As Barry neared the climax of his stirring performance and opened his mouth to speak, Mr. Bourgeis looked at him with great concern. Barry, can you hear me all right?
Yes, sir, Barry replied. Why?
Well, I could hear the answer all the way up here. Surely you could hear it just one row away?
The class went wild. They might as well have sent us home early for all we accomplished the rest of the day.
I like to think I know more about performance programming than Barry knew about math. Nonetheless, I always welcome good ideas and comments, and many readers have sent me a slew of those over the years. So in this chapter, I think Ill return the favor by devoting a chapter to reader feedback.
Several people have pointed out that while LEA is great for performing certain additions (see Chapter 6), it isnt a perfect replacement for ADD. Whats the difference? LEA, an addressing instruction by trade, doesnt affect the flags, while the arithmetic ADD instruction most certainly does. This is no problem when performing additions that involve only quantities that fit in one machine word (32 bits in 386 protected mode, 16 bits otherwise), but it renders LEA useless for multiword operations, which use the Carry flag to tie together partial results. For example, these instructions
ADD EAX,EBX ADC EDX,ECX
could not be replaced
LEA EAX,[EAX+EBX] ADC EDX,ECX
because LEA doesnt affect the Carry flag.
The no-carry characteristic of LEA becomes a distinct advantage when performing pointer arithmetic, however. For instance, the following code uses LEA to advance the pointers while adding one 128-bit memory variable to another such variable:
MOV ECX,4 ;# of 32-bit words to add CLC ;no carry into the initial ADC ADDLOOP: MOV EAX,[ESI] ;get the next element of one array ADC [EDI],EAX ;add it to the other array, with carry LEA ESI,[ESI+4] ;advance one arrays pointer LEA EDI,[EDI+4] ;advance the other arrays pointer LOOP ADDLOOP
(Yes, I could use LODSD instead of MOV/LEA; Im just illustrating a point here. Besides, LODS is only 1 cycle faster than MOV/LEA on the 386, and is actually more than twice as slow on the 486.) If we used ADD rather than LEA to advance the pointers, the carry from one ADC to the next would have to be preserved with either PUSHF/POPF or LAHF/SAHF. (Alternatively, we could use multiple INCs, since INC doesnt affect the Carry flag.)
In short, LEA is indeed different from ADD. Sometimes its better. Sometimes not; thats the nature of the various instruction substitutions and optimizations that will occur to you over time. Theres no such thing as best instructions on the x86; it all depends on what youre trying to do.
But there sure are a lot of interesting options, arent there?
Reader John Kennedy regularly passes along intriguing assembly programming tricks, many of which Ive never seen mentioned anywhere else. John likes to optimize for size, whereas I lean more toward speed, but many of his optimizations are good for both purposes. Here are a few of my favorites:
Johns code for setting AX to its absolute value is:
CWD XOR AX,DX SUB AX,DX
This does nothing when bit 15 of AX is 0 (that is, if AX is positive). When AX is negative, the code nots it and adds 1, which is exactly how you perform a twos complement negate. For the case where AX is not negative, this trick usually beats the stuffing out of the standard absolute value code:
AND AX,AX ;negative? JNS IsPositive ;no NEG AX ;yes,negate it IsPositive:
However, Johns code is slower on a 486; as youre no doubt coming to realize (and as Ill explain in Chapters 12 and 13), the 486 is an optimization world unto itself.
Heres how John copies a block of bytes from DS:SI to ES:DI, moving as much data as possible a word at a time:
SHR CX,1 ;word count REP MOVSW ;copy as many words as possible ADC CX,CX ;CX=1 if copy length was odd, ;0 else REP MOVSB ;copy any odd byte
(ADC CX,CX can be replaced with RCL CX,1; which is faster depends on the processor type.) It might be hard to believe that the above is faster than this:
SHR CX,1 ;word count REP MOVSW ;copy as many words as ;possible JNC CopyDone ;done if even copy length MOVSB ;copy the odd byte CopyDone:
Previous | Table of Contents | Next |