|Previous||Table of Contents||Next|
C compilers work within the stack frame model, whereby variables reside in a block of stack memory and are accessed via offsets from BP. Compilers may store a couple of variables in registers and may briefly keep other variables in registers when theyre used repeatedly, but the stack frame is the underlying architecture. Its a nice architecture; its flexible, convenient, easy to program, and makes for fairly compact code. However, stack frames have a few drawbacks. They must be constructed and destroyed, which takes both time and code. They are so easy to use that they tend to bias the assembly language programmer in favor of accessing memory variables more often than might be necessary. Finally, you cannot use BP as a general-purpose register if you intend to access a stack frame, and having that seventh register available is sometimes useful indeed.
That doesnt mean you shouldnt use stack frames, which are useful and often necessary. Just dont fall victim to their undeniable charms.
C compilers are not terrific at handling segments. Some compilers can efficiently handle a single far pointer used in a loop by leaving ES set for the duration of the loop. But two far pointers used in the same loop confuse every compiler Ive seen, causing the full segment:offset address to be reloaded each time either pointer is used.
|This particularly affects performance in 286 protected mode (under OS/2 1.X or the Rational DOS Extender, for example) because segment loads in protected mode take a minimum of 17 cycles, versus a mere 2 cycles in real mode.|
In assembly language you have full control over segments. Use it, and, if necessary, reorganize your code to minimize segment loading.
You might think that the most obvious advantage assembly language has over C is that it allows the use of all forms of instructions and all registers in all ways, whereas C compilers tend to use a subset of registers and instructions in a limited number of ways. Yes and no. Its true that C compilers typically dont generate instructions such as XLAT, rotates, or the string instructions. On the other hand, XLAT and rotates are useful in a limited set of circumstances, and string instructions are used in the C library functions. In fact, C library code is likely to be carefully optimized by experts, and may be much better than equivalent code youd produce yourself.
Am I saying that C compilers produce better code than you do? No, Im saying that they can, unless you use assembly language properly. Writing code in assembly language rather than C guarantees nothing.
|You can write good assembly, bad assembly, or assembly that is virtually indistinguishable from compiled code; you are more likely than not to write the latter if you think that optimization consists of tweaking compiled C code.|
Sure, you can probably use the registers more efficiently and take advantage of an instruction or two that the compiler missed, but the code isnt going to get a whole lot faster that way.
True optimization requires rethinking your code to take advantage of assembly language. A C loop that searches through an integer array for matches might compile
Figure 8.1 Tweaked compiler output for a loop.
to something like Figure 8.1A. You might look at that and tweak it to the code shown in Figure 8.1B.
Congratulations! Youve successfully eliminated all stack frame access, youve used LOOP (although DEC SI/JNZ is actually faster on 386 and later machines, as I explained in the last chapter), and youve used a string instruction. Unfortunately, the new code isnt going to run very much faster. Maybe 25 percent faster, maybe a little more. Big deal. Youve eliminated the trappings of the compilerthe stack frame and the restricted register usagebut youre still thinking like the compiler. Try this:
repnz scasw jz Match
Its a simple examplebut, I hope, a convincing one. Stretch your brain when you optimize.
The ultimate in assembly language optimization comes when you change the rules; that is, when you reorganize the entire program to allow the use of better assembly language code in the small section of code that most affects overall performance. For example, consider that the data searched in the last example is stored in an array of structures, with each structure in the array containing other information as well. In this situation, REP SCASW couldnt be used because the data searched through wouldnt be contiguous.
However, if the need for performance in searching the array is urgent enough, theres no reason why you cant reorganize the data. This might mean removing the array elements from the structures and storing them in their own array so that REP SCASW could be used.
|Organizing a programs data so that the performance of the critical sections can be optimized is a key part of design, and one thats easily shortchanged unless, during the design stage, you thoroughly understand and work to bring together your data needs, the critical sections of your program, and potential assembly language optimizations.|
More on this shortly.
To recap, here are some things to look for when striving to convert C code into optimized assembly language:
That said, let me show some of these precepts in action.
Listing 8.1 is the sample C application Im going to use to examine optimization in action. Listing 8.1 isnt really completeit doesnt handle the no-matches case well, and it assumes that the sum of all matches will fit into an intbut it will do just fine as an optimization example.
|Previous||Table of Contents||Next|