|Previous||Table of Contents||Next|
And indeed it does. When the test program is modified to draw to a local buffer, both the C and assembly language versions get 0.29 seconds faster, that being a measure of the time taken by display memory wait states. With those wait states factored out, the assembly language version of DrawHorizontalLineList becomes almost three times as fast as the C code.
|There is a lesson here. An optimization has no fixed payoff; its value fluctuates according to the context in which it is used. Theres relatively little benefit to further optimizing code that already spends half its time waiting for display memory; no matter how good your optimizations, youll get only a two-times speedup at best, and generally much less than that. There is, on the other hand, potential for tremendous improvement when drawing to system memory, so if thats where most of your drawing will occur, optimizations such as Listing 39.3 are well worth the effort.|
|Know the environments in which your code will run, and know where the cycles go in those environments.|
LISTING 39.3 L39-3.ASM
; Draws all pixels in the list of horizontal lines passed in, in ; mode 13h, the VGAs 320×200 256-color mode. Uses REP STOS to fill ; each line. ; C near-callable as: ; void DrawHorizontalLineList(struct HLineList * HLineListPtr, ; int Color); ; All assembly code tested with TASM and MASM SCREEN_WIDTH equ 320 SCREEN_SEGMENT equ 0a000h HLinestruc XStart dw ? ;X coordinate of leftmost pixel in line XEnd dw ? ;X coordinate of rightmost pixel in line HLine ends HLineList struc Lngth dw ? ;# of horizontal lines YStart dw ? ;Y coordinate of topmost line HLinePtr dw ? ;pointer to list of horz lines HLineList ends Parms struc dw 2 dup(?) ;return address & pushed BP HLineListPtr dw ? ;pointer to HLineList structure Color dw ? ;color with which to fill Parms ends .model small .code public _DrawHorizontalLineList align 2 _DrawHorizontalLineList proc push bp ;preserve callers stack frame mov bp,sp ;point to our stack frame push si ;preserve callers register variables push di cld ;make string instructions inc pointers mov ax,SCREEN_SEGMENT mov es,ax ;point ES to display memory for REP STOS mov si,[bp+HLineListPtr] ;point to the line list mov ax,SCREEN_WIDTH ;point to the start of the first scan mul [si+YStart] ; line in which to draw mov dx,ax ;ES:DX points to first scan line to ; draw mov bx,[si+HLinePtr] ;point to the XStart/XEnd descriptor ; for the first (top) horizontal line mov si,[si+Lngth] ;# of scan lines to draw and si,si ;are there any lines to draw? jz FillDone ;no, so were done mov al,byte ptr [bp+Color];color with which to fill mov ah,al ;duplicate color for STOSW FillLoop: mov di,[bx+XStart] ;left edge of fill on this line mov cx,[bx+XEnd] ;right edge of fill sub cx,di js LineFillDone ;skip if negative width inc cx ;width of fill on this line add di,dx ;offset of left edge of fill test di,1 ;does fill start at an odd address? jz MainFill ;no stosb ;yes, draw the odd leading byte to ; word-align the rest of the fill dec cx ;count off the odd leading byte jz LineFillDone ;done if that was the only byte MainFill: shr cx,1 ;# of words in fill rep stosw ;fill as many words as possible adc cx,cx ;1 if theres an odd trailing byte to ; do, 0 otherwise rep stosb ;fill any odd trailing byte LineFillDone: add bx,size HLine ;point to the next line descriptor add dx,SCREEN_WIDTH ;point to the next scan line dec si ;count off lines to fill jnz FillLoop FillDone: pop di ;restore callers register variables pop si pop bp ;restore callers stack frame ret _DrawHorizontalLineList endp end
Listing 39.3 doesnt take the easy way out and use REP STOSB to fill each scan line; instead, it uses REP STOSW to fill as many pixel pairs as possible via word-sized accesses, using STOSB only to do odd bytes. Word accesses to odd addresses are always split by the processor into 2-byte accesses. Such word accesses take twice as long as word accesses to even addresses, so Listing 39.3 makes sure that all word accesses occur at even addresses, by performing a leading STOSB first if necessary.
Listing 39.3 is another case in which its worth knowing the environment in which your code will run. Extra code is required to perform aligned word-at-a-time filling, resulting in extra overhead. For very small or narrow polygons, that overhead might overwhelm the advantage of drawing a word at a time, making plain old REP STOSB faster.
Finally, Listing 39.4 is an assembly language version of ScanEdge. Listing 39.4 is a relatively straightforward translation from C to assembly, but is nonetheless about twice as fast as Listing 39.2.
The version of ScanEdge in Listing 39.4 could certainly be sped up still further by unrolling the loops. FillConvexPolygon, the overall coordination routine, hasnt even been converted to assembly language, so that could be sped up as well. I havent bothered with these optimizations because all code other than DrawHorizontalLineList takes only 14 percent of the overall polygon filling time when drawing to display memory; the potential return on optimizing nondrawing code simply isnt great enough to justify the effort. Part of the value of a profiler is being able to tell when to stop optimizing; with Listings 39.3 and 39.4 in use, more than two-thirds of the time taken to draw polygons is spent waiting for display memory, so optimization is pretty much maxed out. However, further optimization might be worthwhile when drawing to system memory, where wait states are out of the picture and the nondrawing code takes a significant portion (46 percent) of the overall time.
Again, know where the cycles go .
By the way, note that all the versions of ScanEdge and FillConvexPolygon that weve looked at are adapter-independent, and that the C code is also machine-independent; all adapter-specific code is isolated in DrawHorizontalLineList. This makes it easy to add support for other graphics systems, such as the 8514/A, the XGA, or, for that matter, a completely non-PC system.
|Previous||Table of Contents||Next|