Michael Abrash's Graphics Programming Black Book Special Edition: Pushing the 286 and 386

LISTING 11.5 L11-5.ASM

;
; *** Listing 11.5 ***
;
; Measures the performance of adding an immediate value
; to a memory variable, for comparison with Listing 11.4,
; which adds an immediate value to a register.
;
        jmp     Skip
;
        even            ;always make sure word-sized memory
                        ; variables are word-aligned!
WordVar dw      0
;
Skip:
        call    ZTimerOn
        rept    1000
        add     [WordVar]100h
        endm
        call    ZTimerOff

What’s going on? Simply this: Instruction fetching is controlling overall execution time on both processors. Both the 8088 in a PC and the 286 in an AT can execute the bytes of the instructions in Listings 11.4 and 11.5 faster than they can be fetched. Since the instructions are exactly the same lengths on both processors, it stands to reason that the ratio of the overall execution times of the instructions should be the same on both processors as well. Instruction length controls execution time, and the instruction lengths are the same—therefore the ratios of the execution times are the same. The 286 can both fetch and execute instruction bytes faster than the 8088 can, so code executes much faster on the 286; nonetheless, because the 286 can also execute those instruction bytes much faster than it can fetch them, overall performance is still largely determined by the size of the instructions.

Is this always the case? No. When the prefetch queue is full, memory-accessing instructions on the 286 and 386 are much faster (relative to register-only instructions) than they are on the 8088. Given the system wait states prevalent on 286 and 386 computers, however, the prefetch queue is likely to be empty quite a bit, especially when code consisting of instructions with short EU execution times is executed. Of course, that’s just the sort of code we’re likely to write when we’re optimizing, so the performance of high-speed code is more likely to be controlled by instruction size than by EU execution time on most 286 and 386 computers, just as it is on the PC.

All of which is just a way of saying that faster memory access and EA calculation notwithstanding, it’s just as desirable to keep instructions short and memory accesses to a minimum on the 286 and 386 as it is on the 8088. And the way to do that is to use the registers as heavily as possible, use string instructions, use short forms of instructions, and the like.

The more things change, the more they remain the same....

POPF and the 286

We’ve one final 286-related item to discuss: the hardware malfunction of POPF under certain circumstances on the 286.

The problem is this: Sometimes POPF permits interrupts to occur when interrupts are initially off and the setting popped into the Interrupt flag from the stack keeps interrupts off. In other words, an interrupt can happen even though the Interrupt flag is never set to 1. Now, I don’t want to blow this particular bug out of proportion. It only causes problems in code that cannot tolerate interrupts under any circumstances, and that’s a rare sort of code, especially in user programs. However, some code really does need to have interrupts absolutely disabled, with no chance of an interrupt sneaking through. For example, a critical portion of a disk BIOS might need to retrieve data from the disk controller the instant it becomes available; even a few hundred microseconds of delay could result in a sector’s worth of data misread. In this case, one misplaced interrupt during a POPF could result in a trashed hard disk if that interrupt occurs while the disk BIOS is reading a sector of the File Allocation Table.

There is a workaround for the POPF bug. While the workaround is easy to use, it’s considerably slower than POPF, and costs a few bytes as well, so you won’t want to use it in code that can tolerate interrupts. On the other hand, in code that truly cannot be interrupted, you should view those extra cycles and bytes as cheap insurance against mysterious and erratic program crashes.

One obvious reason to discuss the POPF workaround is that it’s useful. Another reason is that the workaround is an excellent example of Zen-level assembly coding, in that there’s a well-defined goal to be achieved but no obvious way to do so. The goal is to reproduce the functionality of the POPF instruction without using POPF, and the place to start is by asking exactly what POPF does.

All POPF does is pop the word on top of the stack into the FLAGS register, as shown in Figure 11.4. How can we do that without POPF? Of course, the 286’s designers intended us to use POPF for this purpose, and didn’t intentionally provide any alternative approach, so we’ll have to devise an alternative approach of our own. To do that, we’ll have to search for instructions that contain some of the same functionality as POPF, in the hope that one of those instructions can be used in some way to replace POPF.

Table of Contents