Michael Abrash's Graphics Programming Black Book Special Edition

About the Author

Part I
Chapter 1—The Best Optimizer Is between Your Ears
The Human Element of Code Optimization
Understanding High Performance
When Fast Isn’t Fast
Rules for Building High-Performance Code
Know Where You’re Going
Make a Big Map
Make Lots of Little Maps
Know the Territory
Know When It Matters
Always Consider the Alternatives
Know How to Turn On the Juice
Where We’ve Been, What We’ve Seen
Where We’re Going
Chapter 2—A World Apart
The Unique Nature of Assembly Language Optimization
Instructions: The Individual versus the Collective
Assembly Is Fundamentally Different
Transformation Inefficiencies
The Flexible Mind
Where to Begin?
Chapter 3—Assume Nothing
Understanding and Using the Zen Timer
The Costs of Ignorance
The Zen Timer
The Zen Timer Is a Means, Not an End
Starting the Zen Timer
Time and the PC
Stopping the Zen Timer
Reporting Timing Results
Notes on the Zen Timer
A Sample Use of the Zen Timer
The Long-Period Zen Timer
Stopping the Clock
Example Use of the Long-Period Zen Timer
Using the Zen Timer from C
Watch Out for Optimizing Assemblers!
Further Reading
Armed with the Zen Timer, Onward and Upward
Chapter 4—In the Lair of the Cycle-Eaters
How the PC Hardware Devours Code Performance
The Nature of Cycle-Eaters
The 8088’s Ancestral Cycle-Eaters
The 8-Bit Bus Cycle-Eater
The Impact of the 8-Bit Bus Cycle-Eater
What to Do about the 8-Bit Bus Cycle-Eater?
The Prefetch Queue Cycle-Eater
Official Execution Times Are Only Part of the Story
There Is No Such Beast as a True Instruction Execution Time
Approximating Overall Execution Times
What to Do about the Prefetch Queue Cycle-Eater?
Holding Up the 8088
Dynamic RAM Refresh: The Invisible Hand
How DRAM Refresh Works in the PC
The Impact of DRAM Refresh
What to Do About the DRAM Refresh Cycle-Eater?
Wait States
The Display Adapter Cycle-Eater
The Impact of the Display Adapter Cycle-Eater
What to Do about the Display Adapter Cycle-Eater?
Cycle-Eaters: A Summary
What Does It All Mean?
Chapter 5—Crossing the Border
Searching Files with Restartable Blocks
Searching for Text
Avoiding the String Trap
Brute-Force Techniques
Using memchr()
Making a Search Restartable
Interpreting Where the Cycles Go
Knowing When Assembly Is Pointless
Always Look Where Execution Is Going
Chapter 6—Looking Past Face Value
How Machine Instructions May Do More Than You Think
Memory Addressing and Arithmetic
Math via Memory Addressing
The Wonders of LEA on the 386
Multiplication with LEA Using Non-Powers of Two
Chapter 7—Local Optimization
Optimizing Halfway between Algorithms and Cycle Counting
When LOOP Is a Bad Idea
The Lessons of LOOP and JCXZ
Avoiding LOOPS of Any Stripe
Local Optimization
Unrolling Loops
Rotating and Shifting with Tables
NOT Flips Bits—Not Flags
Incrementing with and without Carry
Chapter 8—Speeding Up C with Assembly Language
Jumping Languages When You Know It’ll Help
Billy, Don’t Be a Compiler
Don’t Call Your Functions on Me, Baby
Stack Frames Slow So Much
Torn Between Two Segments
Why Speeding Up Is Hard to Do
Taking It to the Limit
A C-to-Assembly Case Study
Chapter 9—Hints My Readers Gave Me
Optimization Odds and Ends from the Field
Another Look at LEA
The Kennedy Portfolio
Speeding Up Multiplication
Optimizing Optimized Searching
Short Sorts
Full 32-Bit Division
Sweet Spot Revisited
Hard-Core Cycle Counting
Hardwired Far Jumps
Setting 32-Bit Registers: Time versus Space
Chapter 10—Patient Coding, Faster Code
How Working Quickly Can Bring Execution to a Crawl
The Case for Delayed Gratification
The Brute-Force Syndrome
Wasted Breakthroughs
Patient Optimization
Chapter 11—Pushing the 286 and 386
New Registers, New Instructions, New Timings, New Complications
Family Matters
Crossing the Gulf to the 286 and the 386
In the Lair of the Cycle-Eaters, Part II
System Wait States
Data Alignment
Code Alignment
Alignment and the 386
Alignment and the Stack
The DRAM Refresh Cycle-Eater: Still an Act of God
The Display Adapter Cycle-Eater
New Instructions and Features: The 286
New Instructions and Features: The 386
Optimization Rules: The More Things Change...
Detailed Optimization
POPF and the 286
Chapter 12—Pushing the 486
It’s Not Just a Bigger 386
Enter the 486
Rules to Optimize By
The Hazards of Indexed Addressing
Calculate Memory Pointers Ahead of Time
Caveat Programmor
Stack Addressing and Address Pipelining
Problems with Byte Registers
More Fun with Byte Registers
Timing Your Own 486 Code
The Story Continues
Chapter 13—Aiming the 486
Pipelines and Other Hazards of the High End
486 Pipeline Optimization
BSWAP: More Useful Than You Might Think
Pushing and Popping Memory
Optimal 1-Bit Shifts and Rotates
32-Bit Addressing Modes
Chapter 14—Boyer-Moore String Searching
Optimizing a Pretty Optimum Search Algorithm
String Searching Refresher
The Boyer-Moore Algorithm
Boyer-Moore: The Good and the Bad
Further Optimization of Boyer-Moore
Know What You Know
Chapter 15—Linked Lists and plain Unintended Challenges
Unfamiliar Problems with Familiar Data Structures
Linked Lists
Dummies and Sentinels
Circular Lists
Hi/Lo in 24 Bytes
Chapter 16—There Ain’t No Such Thing as the Fastest Code
Lessons Learned in the Pursuit of the Ultimate Word Counter
Counting Words in a Hurry
Which Way to Go from Here?
Challenges and Hazards
Blinding Yourself to a Better Approach
Watch Out for Luggable Assumptions!
The Astonishment of Right-Brain Optimization
Levels of Optimization
Optimization Level 1: Good Code
Level 2: A New Perspective
Level 3: Breakthrough
Enough Word Counting Already!
Chapter 17—The Game of Life
The Triumph of Algorithmic Optimization in a Cellular Automata Game
Conway’s Game
The Rules of the Game
Where Does the Time Go?
The Hazards and Advantages of Abstraction
Heavy-Duty C++ Optimization
Bringing In the Right Brain
Re-Examining the Task
Acting on What We Know
The Challenge That Ate My Life
Chapter 18—It’s a plain Wonderful Life
Optimization beyond the Pale
Breaking the Rules
Table-Driven Magic
Keeping Track of Change with a Change List
A Layperson’s Overview of QLIFE
Chapter 19—Pentium: Not the Same Old Song
Learning a Whole Different Set of Optimization Rules
The Return of Optimization as Art
The Pentium: An Overview
Crossing Cache Lines
Cache Organization
Faster Addressing and More
Branch Prediction
Miscellaneous Pentium Topics
486 versus Pentium Optimization
Going Superscalar
Chapter 20—Pentium Rules
How Your Carbon-Based Optimizer Can Put the “Super” in Superscalar
An Instruction in Every Pipe
V-Pipe-Capable Instructions
Lockstep Execution
Superscalar Notes
Register Starvation
Chapter 21—Unleashing the Pentium’s V-Pipe
Focusing on Keeping Both Pentium Pipes Full
Address Generation Interlocks
Register Contention
Exceptions to Register Contention
Who’s in First?
Pentium Optimization in Action
A Quick Note on the 386 and 486
Chapter 22—Zenning and the Flexible Mind
Taking a Spin through What You’ve Learned

Part II
Chapter 23—Bones and Sinew
At the Very Heart of Standard PC Graphics
An Introduction to VGA Programming
At the Core
Linear Planes and True VGA Modes
Smooth Panning
Color Plane Manipulation
Page Flipping
The Hazards of VGA Clones
Just the Beginning
The Macro Assembler
Chapter 24—Parallel Processing with the VGA
Taking on Graphics Memory Four Bytes at a Time
VGA Programming: ALUs and Latches
Notes on the ALU/Latch Demo Program
Chapter 25—VGA Data Machinery
The Barrel Shifter, Bit Mask, and Set/Reset Mechanisms
VGA Data Rotation
The Bit Mask
The VGA’s Set/Reset Circuitry
Setting All Planes to a Single Color
Manipulating Planes Individually
Notes on Set/Reset
A Brief Note on Word OUTs
Chapter 26—VGA Write Mode 3
The Write Mode That Grows on You
A Mode Born in Strangeness
A Note on Preserving Register Bits
Chapter 27—Yet Another VGA Write Mode
Write Mode 2, Chunky Bitmaps,and Text-Graphics Coexistence
Write Mode 2 and Set/Reset
A Byte’s Progress in Write Mode 2
Copying Chunky Bitmaps to VGA Memory Using Write Mode 2
Drawing Color-Patterned Lines Using Write Mode 2
When to Use Write Mode 2 and When to Use Set/Reset
Mode 13H—320×200 with 256 Colors
Flipping Pages from Text to Graphics and Back
Chapter 28—Reading VGA Memory
Read Modes 0 and 1, and the Color Don’t Care Register
Read Mode 0
Read Mode 1
When all Planes “Don’t Care”
Chapter 29—Saving Screens and Other VGA Mysteries
Useful Nuggets from the VGA Zen File
Saving and Restoring EGA and VGA Screens
16 Colors out of 64
A Bonus Blanker
Modifying VGA Registers
Chapter 30—Video Est Omnis Divisa
The Joys and Galling Problems of Using Split Screens on the EGA and VGA
How the Split Screen Works
The Split Screen in Action
VGA and EGA Split-Screen Operation Don’t Mix
Setting the Split-Screen-Related Registers
The Problem with the EGA Split Screen
Split Screen and Panning
The Split Screen and Horizontal Panning: An Example
Notes on Setting and Reading Registers
Split Screens in Other Modes
How Safe?
Chapter 31—Higher 256-Color Resolution on the VGA
When Is 320×200 Really 320×400?
Why 320x200? Only IBM Knows for Sure
320x400 256-Color Mode
Display Memory Organization in 320x400 Mode
Reading and Writing Pixels
Two 256-Color Pages
Something to Think About
Chapter 32—Be It Resolved: 360×480
Taking 256-Color Modes About as Far as the Standard VGA Can Take Them
Extended 256-Color Modes: What’s Not to Like?
360x480 256-Color Mode
How 360×480 256-Color Mode Works
480 Scan Lines per Screen: A Little Slower, But No Big Deal
360 Pixels per Scan Line: No Mean Feat
Accessing Display Memory in 360×480 256-Color Mode
Chapter 33—Yogi Bear and Eurythmics Confront VGA Colors
The Basics of VGA Color Generation
VGA Color Basics
The Palette RAM
Color Paging with the Color Select Register
256-Color Mode
Setting the Palette RAM
Setting the DAC
If You Can’t Call the BIOS, Who Ya Gonna Call?
An Example of Setting the DAC
Chapter 34—Changing Colors without Writing Pixels
Special Effects through Realtime Manipulation of DAC Colors
Color Cycling
The Heart of the Problem
Loading the DAC via the BIOS
Loading the DAC Directly
A Test Program for Color Cycling
Color Cycling Approaches that Work
Odds and Ends
The DAC Mask
Reading the DAC
Cycling Down
Chapter 35—Bresenham Is Fast, and Fast Is Good
Implementing and Optimizing Bresenham’s Line-Drawing Algorithm
The Task at Hand
Bresenham’s Line-Drawing Algorithm
Strengths and Weaknesses
An Implementation in C
Looking at EVGALine
Drawing Each Line
Drawing Each Pixel
Comments on the C Implementation
Bresenham’s Algorithm in Assembly
Chapter 36—The Good, the Bad, and the Run-Sliced
Faster Bresenham Lines with Run-Length Slice Line Drawing
Run-Length Slice Fundamentals
Run-Length Slice Implementation
Run-Length Slice Details
Chapter 37—Dead Cats and Lightning Lines
Optimizing Run-Length Slice Line Drawing in a Major Way
Fast Run-Length Slice Line Drawing
How Fast Is Fast?
Further Optimizations
Chapter 38—The Polygon Primeval
Drawing Polygons Efficiently and Quickly
Filled Polygons
Which Side Is Inside?
How Do You Fit Polygons Together?
Filling Non-Overlapping Convex Polygons
Oddball Cases
Chapter 39—Fast Convex Polygons
Filling Polygons in a Hurry
Fast Convex Polygon Filling
Fast Drawing
Fast Edge Tracing
The Finishing Touch: Assembly Language
Maximizing REP STOS
Faster Edge Tracing
Chapter 40—Of Songs, Taxes, and the Simplicity of Complex Polygons
Dealing with Irregular Polygonal Areas
Filling Arbitrary Polygons
Active Edges
Complex Polygon Filling: An Implementation
More on Active Edges
Performance Considerations
Nonconvex Polygons
Details, Details
Chapter 41—Those Way-Down Polygon Nomenclature Blues
Names Do Matter when You Conceptualize a Data Structure
Nomenclature in Action
Chapter 42—Wu’ed in Haste; Fried, Stewed at Leisure
Fast Antialiased Lines Using Wu’s Algorithm
Wu Antialiasing
Tracing and Intensity in One
Sample Wu Antialiasing
Notes on Wu Antialiasing
Chapter 43—Bit-Plane Animation
A Simple and Extremely Fast Animation Method for Limited Color
Bit-Planes: The Basics
Stacking the Palette Registers
Bit-Plane Animation in Action
Limitations of Bit-Plane Animation
Shearing and Page Flipping
Beating the Odds in the Jaw-Dropping Contest
Chapter 44—Split Screens Save the Page Flipped Day
640x480 Page Flipped Animation in 64K...Almost
A Plethora of Challenges
A Page Flipping Animation Demonstration
Write Mode 3
Drawing Text
Page Flipping
Knowing When to Flip
Enter the Split Screen
Chapter 45—Dog Hair and Dirty Rectangles
Different Angles on Animation
Plus ça Change
VGA Access Times
Dirty-Rectangle Animation
So Why Not Use Page Flipping?
Dirty Rectangles in Action
Hi-Res VGA Page Flipping
Another Interesting Twist on Page Flipping
Chapter 46—Who Was that Masked Image?
Optimizing Dirty-Rectangle Animation
Dirty-Rectangle Animation, Continued
Masked Images
Internal Animation
Dirty-Rectangle Management
Drawing Order and Visual Quality
Chapter 47—Mode X: 256-Color VGA Magic
Introducing the VGA’s Undocumented “Animation-Optimal” Mode
What Makes Mode X Special?
Selecting 320x240 256-Color Mode
Designing from a Mode X Perspective
Hardware Assist from an Unexpected Quarter
Chapter 48—Mode X Marks the Latch
The Internals of Animation’s Best Video Display Mode
Allocating Memory in Mode X
Copying Pixel Blocks within Display Memory
Copying to Display Memory
Who Was that Masked Image Copier?
Chapter 49—Mode X 256-Color Animation
How to Make the VGA Really Get up and Dance
Masked Copying
Faster Masked Copying
Notes on Masked Copying
Mode X Animation in Action
Works Fast, Looks Great
Chapter 50—Adding a Dimension
3-D Animation Using Mode X
References on 3-D Drawing
The 3-D Drawing Pipeline
A Simple 3-D Example
Notes on the 3-D Animation Example
An Ongoing Journey
Chapter 51—Sneakers in Space
Using Backface Removal to Eliminate Hidden Surfaces
One-sided Polygons: Backface Removal
Backface Removal in Action
Incremental Transformation
A Note on Rounding Negative Numbers
Object Representation
Chapter 52—Fast 3-D Animation: Meet X-Sharp
The First Iteration of a Generalized 3-D Animation Package
This Chapter’s Demo Program
A New Animation Framework: X-Sharp
Three Keys to Realtime Animation Performance
Where the Time Goes
Chapter 53—Raw Speed and More
The Naked Truth About Speed in 3-D Animation
Raw Speed, Part 1: Assembly Language
Raw Speed, Part II: Look it Up
Hidden Surfaces
Chapter 54—3-D Shading
Putting Realistic Surfaces on Animated 3-D Objects
Support for Older Processors
Ambient Shading
Diffuse Shading
Shading: Implementation Details
Chapter 55—Color Modeling in 256-Color Mode
Pondering X-Sharp’s Color Model in an RGB State of Mind
A Color Model
A Bonus from the BitMan
Chapter 56—Pooh and the Space Station
Using Fast Texture Mapping to Place Pooh on a Polygon
Principles of Quick-and-Dirty Texture Mapping
Mapping Textures Made Easy
Fast Texture Mapping: An Implementation
Chapter 57—10,000 Freshly Sheared Sheep on the Screen
The Critical Role of Experience in Implementing Fast, Smooth Texture Mapping
Visual Quality: A Black Hole ... Er, Art
Fixed-Point Arithmetic, Redux
Texture Mapping: Orientation Independence
Mapping Textures across Multiple Polygons
Fast Texture Mapping
Chapter 58—Heinlein’s Crystal Ball, Spock’s Brain, and the 9-Cycle Dare
Using the Whole-Brain Approach to Accelerate Texture Mapping
Texture Mapping Redux
Left-Brain Optimization
A 90-Degree Shift in Perspective
That’s Nice—But it Sure as Heck Ain’t 9 Cycles
Don’t Stop Thinking about Those Cycles
Texture Mapping Notes
Chapter 59—The Idea of BSP Trees
What BSP Trees Are and How to Walk Them
BSP Trees
Visibility Determination
Limitations of BSP Trees
Building a BSP Tree
Visibility Ordering
Inorder Walks of BSP Trees
Know It Cold
Measure and Learn
Surfing Amidst the Trees
Related Reading
Chapter 60—Compiling BSP Trees
Taking BSP Trees from Concept to Reality
Compiling BSP Trees
Parametric Lines
Parametric Line Clipping
The BSP Compiler
Optimizing the BSP Tree
BSP Optimization: an Undiscovered Country
Chapter 61—Frames of Reference
The Fundamentals of the Math behind 3-D Graphics
3-D Math
Foundation Definitions
The Dot Product
Dot Products of Unit Vectors
Cross Products and the Generation of Polygon Normals
Using the Sign of the Dot Product
Using the Dot Product for Projection
Rotation by Projection
Chapter 62—One Story, Two Rules, and a BSP Renderer
Taking a Compiled BSP Tree from Logical to Visual Reality
BSP-based Rendering
The Rendering Pipeline
Moving the Viewer
Transformation into Viewspace
Projection to Screenspace
Walking the Tree, Backface Culling and Drawing
Notes on the BSP Renderer
Chapter 63—Floating-Point for Real-Time 3-D
Knowing When to Hurl Conventional Math Wisdom Out the Window
Not Your Father’s Floating-Point
Pentium Floating-Point Optimization
Pipelining, Latency, and Throughput
The Dot Product
The Cross Product
Rounding Control
A Farewell to 3-D Fixed-Point
Chapter 64—Quake’s Visible-Surface Determination
The Challenge of Separating All Things Seen from All Things Unseen
VSD: The Toughest 3-D Challenge of All
The Structure of Quake Levels
Culling and Visible Surface Determination
Nodes Inside and Outside the View Frustum
The Beam Tree
3-D Engine du Jour
Subdividing Raycast
Vertex-Free Surfaces
The Draw-Buffer
Span-Based Drawing
Simplify, and Keep on Trying New Things
Learn Now, Pay Forward
Chapter 65—3-D Clipping and Other Thoughts
Determining What’s Inside Your Field of View
3-D Clipping Basics
Intersecting a Line Segment with a Plane
Polygon Clipping
Clipping to the Frustum
The Lessons of Listing 65.3
Advantages of Viewspace Clipping
Further Reading
Chapter 66—Quake’s Hidden-Surface Removal
Struggling with Z-Order Solutions to the Hidden Surface Problem
Creative Flux and Hidden Surfaces
Drawing Moving Objects
Performance Impact
Leveling and Improving Performance
Sorted Spans
Edges versus Spans
Edge-Sorting Keys
Where That 1/Z Equation Comes From
Quake and Z-Sorting
Decisions Deferred
Chapter 67—Sorted Spans in Action
Implementing Independent Span Sorting for Rendering without Overdraw
Quake and Sorted Spans
Types of 1/z Span Sorting
Intersecting Span Sorting
Abutting Span Sorting
Independent Span Sorting
1/z Span Sorting in Action
Implementation Notes
Chapter 68—Quake’s Lighting Model
A Radically Different Approach to Lighting Polygons
The Lighting Conundrum
Gouraud Shading
Problems with Gouraud Shading
Perspective Correctness
The Quest for Alternative Lighting
Decoupling Lighting from Rasterization
Size and Speed
Surface Caching
Mipmapping To The Rescue
Two Final Notes on Surface Caching
Chapter 69—Surface Caching and Quake’s Triangle Models
Probing Hardware-Assisted Surfaces and Fast Model Animation Without Sprites
Surface Caching with Hardware Assistance
Letting the Graphics Card Build the Textures
The Light Map as Alpha Texture
Drawing Triangle Models
Drawing Triangle Models Fast
Trading Subpixel Precision for Speed
An Idea that Didn’t Work
An Idea that Did Work
More Ideas that Might Work
Chapter 70—Quake: A Post-Mortem and a Glimpse into the Future
Preprocessing the World
The Potentially Visible Set (PVS)
Passages: The Last-Minute Change that Didn’t Happen
Drawing the World
Dynamic Lighting
BSP Models
Polygon Models and Z-Buffering
The Subdivision Rasterizer
How We Spent Our Summer Vacation: After Shipping Quake
Verite Quake
Quake 2
Looking Forward
Appendix A
Graphics Programming Black Book © 2001 Michael Abrash