|Previous||Table of Contents||Next|
LISTING 58.4 L58-4.ASM
; Inner loop to draw a single texture-mapped vertical column, ; rather than a horizontal scanline. Maxed-out 32-bit version. ; ; At this point: ; EAX = sum of integral X & Y source pointer advances ; ECX = source pointer increment to advance one in Y ; EDX = fractional source texture Y coordinate in lower ; 15 bits of DX, fractional source texture X coord ; in high word of EDX, bit 15 set to 0 ; ESI = initial source texture pointer ; EDI = initial destination pointer ; EBP = fractional Y advance in lower 15 bits of BP, ; fractional X advance in high word of EBP, bit ; 15 set to 0 SCANOFFSET=0 REPT LOOP_UNROLL mov bl,[esi] ;get image pixel add edx,ebp ;advance frac Y in DX, ; frac X in high word of EDX adc esi,eax ;advance source pointer by integral ; X & Y amount, also accounting for ; carry from X fractional addition mov [edi+SCANOFFSET],bl ;set screen pixel ; (located here to avoid 486 ; AGI from previous byte op) test dh,80h ;carry from Y fractional addition? jz short @F ;no add esi,ecx ;yes, advance Y by one ; (produces Pentium AGI for MOV BL,[ESI]) and dh,not 80h ;reset the Y fractional carry bit @@: SCANOFFSET = SCANOFFSET + SCANWIDTH ENDM
And there you have it: A five to 10-times speedup of a decent assembly language texture mapper. All it took was some help from my friends, a good, stiff jolt of right-brain thinking, and some solid left-brain polishingplus the knowledge that such a speedup was possible. Treat every optimization task as if John Miles has just written to inform you that hes made it faster than your wildest dreams, and youll be amazed at what you can do!
Listing 58.3 contains no 486 pipeline stalls; it has Pentium stalls, but not much can be done for them because of the size prefix on ADD EDX,ECX, which takes 1 cycle to go through the U-pipe, and shuts down the V-pipe for that cycle. Listing 58.4, on the other hand, has been rearranged to eliminate all Pentium stalls save one. When the Y coordinate fractional part carries and ESI advances, the code executes as follows:
ADD ESI,ECX ;cycle 1 U-pipe AND DH,NOT 80H ;cycle 1 V-pipe ;cycle 2 idle AGI on ESI MOV BL,[ESI] ;cycle 3 U-pipe ADD EDX,EBP ;cycle 3 V-pipe
However, I dont see any way to eliminate this last AGI, which happens about half the time; even with it, the Pentium execution time for Listing 58.4 is 5.5 cycles. Thats 61 nanosecondsa highly respectable 16 million texture-mapped pixels per secondon a 90 MHz Pentium.
The type of texture mapping discussed in both this and earlier chapters doesnt do perspective correction when mapping textures. Why that is and how to handle perspective correction is a topic for a whole separate book, but be aware that the textures on some large polygons (not the polygon edges themselves) drawn with the code in this chapter will appear to be unnaturally bowed, although small polygons should look fine.
Finally, we never did get rid of the last jump in the texture mapper, yet John Miles claimed no jumps at all. How did he do it? Im not sure, but Id guess that he used a two-entry look-up table, based on the Y carry, to decide how much to advance the source pointer in Y. However, I couldnt come up with any implementation of this approach that didnt take 0.5 to 1 cycle more than the test-and-jump approach, so either I didnt come up with an adequately efficient implementation of the table, John saved a cycle somewhere else, or perhaps John implemented his code in a 32-bit segment, but used the less-efficient table in his fervor to get rid of the final jump. The knowledge that I apparently came up with a different solution than John highlights that the technical aspects of Johns implementation were, in truth, totally irrelevant to my optimization efforts; the only actual effect Johns code had on me was to make me believe a texture mapper could run that fast.
Believe it! And while youre at it, give both halves of your brain equal timeand watch out for aliens in short skirts, 60s bouffant hairdos, and an undue interest in either half.
|Previous||Table of Contents||Next|