Optimize your algorithm first before you try these optimizations! That's usually where you get the biggest speed increases.
Before:
After:
Once you've optimized the algorithm all that you can, you can start looking at algebraic and loop optimizations.
The classic one is to use DEFINT A-Z. This forces you to use as many integer variables as possible.
Use integer variables to index FOR loops. This may require substitution and algebraic simplification.
Before:
After:
If your code has a lot of floating point calculations that need high accuracy, compile with QB 4.0.
e.g. a raytracer
If your code has a lot of floating point calculations that don't need more than 8 bits of accuracy, then definitely convert it to fixed point. Even if it needs up to 16 bits of accuracy, it might be worth converting to fixed point, if it is being used in the main loop.
e.g. a rotozoomer or voxel terrain.
Don't use IFs (conditional branches). Some comparison results can be directly be used in a calculation. Note that in QB, a TRUE boolean expression equals -1, and a FALSE one equals 0.
Before:
After:
Actually, the above example is too simple for the After: version to be faster. But for more complicated expressions involving multiplication and division, it can make a difference.
Buffer your reads from a file. This is especially useful in a non-disk-cached environment like DOS 4.0.
Before:
After:
Use an assembler keyboard handler or INP(&H60) plus keyboard buffer clearing routines instead of INKEY$.
e.g. For a user controlled floormapper routine, this made a huge difference in rending fps.
For a straight QB multikey handler, don't bother to clear the keyboard buffer every vertical retrace. Instead, slow down the keyboard repeat rate, and check every few frames.
e.g. This made a huge difference in QBMKEY.BAS
Use integer division for integers.
Before:
After:
Make an integer division lookup table if there is a division slowing down the inner loop.
Store the results of complicated expressions in look-up tables.
Before:
After:
If several complicated expressions in a loop has common subexpressions, move the common subexpressions out of the loop.
Before:
After:
Make constants CONST. Unfortunately, you can't use transcendental functions like ATN on the right side anymore.
Before:
After:
Unroll short loops.
Before:
After:
Partially unroll long loops.
Before:
After:
Move junk outside of the inner loops (code movement).
Before:
After:
Use cache sensitive programming. This means, try to access your arrays in a sequential manner if possible. If not, access them in small blocks that are adjacent to each other. For example, QB arrays are usually stored in a column major order, so dimension your arrays as vscreen(xmax,ymax) if you are doing scanline-based algorithms, and only change move in the x (scanline) direction in the inner loop.
Before:
After:
Use a precalculated (canned) pseudo-random number sequence.
Before:
After:
Prefer array indexing over user defined TYPEs. (1)
Warning: This makes code unreadable unless it is well commented.
Cache often-used array elements in scalar variables. (2)
Cache intermediate values into temporary variables. (3)
Example of both optimizations being used.
Before:
After:
Use REDIM to clear a large array instead of using a FOR loop to set each element to zero.
Before:
After:
Avoid multidimensional arrays. Use array head lookup tables like in the POKE vs. PSET example for faster access of single dimension arrays as multidimensional ones.
Before:
After:
Don't waste an extra element. Unlike C arrays, the declaration of QB arrays specify the first and last element indicies rather than the size of the array. This matters when you want to make a 64KB array without using '$DYNAMIC.
Before:
After:
or
Use incremental calculation instead of evaluating the entire equation every loop. This usually means multiplies will be replaced by addition. It's very important that you do this in any linear interpolation function you use for Gouraud Shading, Texture Mapping, etc. Most line DDAs (digital difference analyzers) use this method.
Before:
After:
Use POKE instead of PSET. This is a simple way to get 2x performance in graphics intensive apps.
Before:
After:
Use INTEGER variables instead of LONGs for unsigned integers in the range 0 to 65535. This will only work when the program is compiled.
PEEKing from video memory is slower than PEEKing from system memory. Therefore, use double buffering when you need to do feedback effects.
Use DEF SEG sparingly. You don't need to DEF SEG back to the default segment when you are accessing arrays in the default segment. DEF SEG only applies to PEEK and POKE and SETMEM.
Don't use '$DYNAMIC. QB arrays in the default segment are accessed at blazing speed, because there is no segment switching. However, '$DYNAMIC puts them in different segments, which need extra instructions to accessed, slowing them down. This makes a big difference in programs that use large lookup tables in their inner loop. It seems that huge arrays (allowed using the QB/AH command) are the slowest to access.
Before:
After:
Use SELECT CASE instead of a bunch of ELSEIFs. The only exception is when one case executes much more often than the others.
Before:
After:
Use AND instead of MOD for MODing by a power of 2.
Before:
After:
Simplify compares against zero.
Before:
After:
Use -x to find the negative of a number instead of -1*x. This is an obvious optimization if you know that the CPU has a NEG instruction, which is faster than IMUL.
Don't put the main loop in the main code-- put it in a SUB. This makes a difference in the IDE, probably because the p-code interpreter has less variables to wade through when you are in a SUB.
Before:
After:
Pass dummy parameters to functions that take an odd number of arguments in order to improve data alignment. The dummy parameter is not used by the function, but is there to encourage burst memory writes. This only makes a minimal difference in speed.
Before:
After:
Don't initialize QB array elements to zero. Warning: this is a dangerous habit to get into, if you plan to use C or C++ later on. This is because C does not initialize variables by default.
Before:
After:
For floating point, multiply by the reciprocal of a number instead of dividing by a number.
Before:
After:
Simplify comparisons using simpler monotonic functions. Monotonic functions are functions that always grow upwards or always grow downwards. For example, x^2 is a monotonic function of x, so is 2*x. In the example, an expensive square root was removed by squaring both sides, since squaring is a monotonic function.
Before:
After:
Advanced optimizations for C++
Thanks for Qasir, entropy, Pasco, and Eclipzer for their critiques and suggestions.
Author: | Toshi (Toshihiro Horie) |
Email: | horie@ocf.berkeley.edu |
Website: | http://www.ocf.berkeley.edu/~horie/project.html |
Released: | Mar 26 2001 |