Once the vertices are drawn, the triangles are processed one at a time. Each triangle that makes it through backface culling is then drawn with recursive subdivision. If any of the triangle’s sides is more than one pixel long in either x or y—that is, if the triangle contains any pixels that aren’t at vertices—then that side is split in half as nearly as possible at given integer coordinates, and a new vertex is created at the split, with texture and screen coordinates that are halfway between those of the vertices at the endpoints. (The same splitting could be done for lighting, but we found that for small triangles—the sort that subdivision works well on—it was adequate to flat-shade each triangle at the light level of the first vertex, so we didn’t bother with Gouraud shading.) The halfway values can be calculated very quickly with shifts. This vertex is drawn, and then each of the two resulting triangles is then processed recursively in the same way, as shown in Figure 69.2. There are some additional details, such as the fill rule that ensures that each pixel is drawn only once (except for backside vertices, as noted above), but basically subdivision rasterization boils down to taking a triangle, splitting a side that has at least one undrawn pixel and drawing the vertex at the split, and repeating the process for each of the two new triangles. The code to do this, shown in Listing 69.1, is very simple and easily optimized, especially by comparison with a generalized triangle rasterizer.

Subdivision rasterization introduces considerably more error than affine texture mapping, and doesn’t draw exactly the right triangle shape, but the difference is very hard to detect for triangles that contain only a few pixels. We found that the point at which the difference between the two rasterizers becomes noticeable was surprisingly close: 30 or 40 feet for the Ogres, and about 12 feet for the Zombies. This means that most of the triangle models that are visible in a typical Quake scene are drawn with subdivision rasterization, not affine texture mapping.

How much does subdivision rasterization help performance? When John originally implemented it, it more than doubled triangle-model drawing speed, because the affine texture mapper was not yet optimized. However, I took it upon myself to see how fast I could make the mapper, so now affine texture mapping is only about 20 percent slower than subdivision rasterization. While 20 percent may not sound impressive, it includes clipping, transform, projection, and backface-culling time, so the rasterization difference alone is more than 50 percent. Besides, 20 percent overall means that we can have 12 monsters now where we could only have had 10 before, so we count subdivision rasterization as a clear success.

LISTING 69.1 L69-1.C

// Quake’s recursive subdivision triangle rasterizer; draws all
// pixels in a triangle other than the vertices by splitting an
// edge to form a new vertex, drawing the vertex, and recursively
// processing each of the two new triangles formed by using the
// new vertex. Results are less accurate than from a precise
// affine or perspective texture mapper, and drawing boundaries
// are not identical to those of a precise polygon drawer, although
// they are consistent between adjacent polygons drawn with this
// technique.
//
// Invented and implemented by John Carmack of id Software.

void D_PolysetRecursiveTriangle (int *lp1, int *lp2, int *lp3)
{
int    *temp;
int    d;

int    new;
int    z;
short  *zbuf;

// try to find an edge that’s more than one pixel long in x or y
d = lp2 - lp1;
if (d < -1 || d > 1)
goto split;
d = lp2 - lp1;
if (d < -1 || d > 1)
goto split;
d = lp3 - lp2;
if (d < -1 || d > 1)
goto split2;
d = lp3 - lp2;
if (d < -1 || d > 1)
goto split2;
d = lp1 - lp3;
if (d < -1 || d > 1)
goto split3;
d = lp1 - lp3;
if (d < -1 || d > 1)
{
split3:
// shuffle points so first edge is edge to split
temp = lp1;
lp1 = lp3;
lp3 = lp2;
lp2 = temp;
goto split;
}

return;         // no pixels left to fill in triangle

split2:
// shuffle points so first edge is edge to split
temp = lp1;
lp1 = lp2;
lp2 = lp3;
lp3 = temp;

split:
// split first edge screen x, screen y, texture s, texture t, and z
// to form a new vertex.  Lighting (index 4) is ignored; the
// difference between interpolating lighting and using the same
// shading for the entire triangle is unnoticeable for small
// triangles, so we just use the lighting for the first vertex of
// the original triangle (which was used during set-up to set
// d_colormap, used below to look up lit texels)
new = (lp1 + lp2) >> 1;        // split screen x
new = (lp1 + lp2) >> 1;        // split screen y
new = (lp1 + lp2) >> 1;        // split texture s
new = (lp1 + lp2) >> 1;        // split texture t
new = (lp1 + lp2) >> 1;        // split z

// draw the point if splitting a leading edge
if (lp2 > lp1)
goto nodraw;
if ((lp2 == lp1) && (lp2 < lp1))
goto nodraw;

z = new>>16;

// point to the pixel’s z-buffer entry, looking up the scanline start
// address based on screen y and adding in the screen x coordinate
zbuf = zspantable[new] + new;

// draw the split vertex if it’s not obscured by something nearer, as
// indicated by the z-buffer
if (z >= *zbuf)
{
int     pix;

// set the z-buffer to the new pixel’s distance
*zbuf = z;

// get the texel from the model’s skin bitmap, according to
// the s and t texture coordinates, and translate it through
// the lighting look-up table set according to the first
// vertex for the original (top-level) triangle.  Both s and
// t are in 16.16 format
pix = d_pcolormap[skintable[new>>16][new>>16]];

// draw the pixel, looking up the scanline start address
// based on screen y and adding in the screen x coordinate
d_viewbuffer[d_scantable[new] + new] = pix;
}

nodraw:
// recursively draw the two new triangles we created by adding the
// split vertex
D_PolysetRecursiveTriangle (lp3, lp1, new);
D_PolysetRecursiveTriangle (lp3, new, lp2);
} Figure 69.2
One recursive subdivision triangle-drawing step.

More Ideas that Might Work

Useful as subdivision rasterization proved to be, we by no means think that we’ve maxed out triangle-model drawing, if only because we spent far less design and development time on subdivision than on the affine rasterizer, so it’s likely that there’s quite a bit more performance to be found for drawing small triangles. For example, it could be faster to precalculate drawing masks or even precompile drawing code for all possible small triangles (say, up to 4×4 or 5×5), and the memory footprint looks reasonable. (It’s worth noting that both precalculated drawing and subdivision rasterization are only possible because we snap to integer coordinates; none of this stuff works with fixed-point vertices.)

More interesting still is the stack-based rendering described in the article “Time/Space Tradeoffs for Polygon Mesh Rendering,” by Bar-Yehuda and Gotsman, in the April, 1996 ACM Transactions on Graphics. Unfortunately, the article is highly abstract and slow going, but the bottom line is that it’s possible to represent a triangle mesh as a stream of commands that place vertices in a stack, remove them from the stack, and draw triangles using the vertices in the stack. This results in excellent CPU cache coherency, because rather than indirecting all over a vertex pool to retrieve vertex data, all vertices reside in a tiny stack that’s guaranteed to be in the cache. Local variables used while drawing can be stored in a small block next to the stack, and the stream of commands representing the model is accessed sequentially from start to finish, so cache utilization should be very high. As processors speed up at a much faster rate than main memory access, cache optimizations of this sort will become steadily more important in improving drawing performance.

As with so many aspects of 3-D, there is no one best approach to drawing triangle models, and no such thing as the fastest code. In a way, that’s frustrating, but the truth is, it’s these nearly infinite possibilities that make 3-D so interesting; not only is it an endless, varied challenge, but there’s almost always a better solution waiting to be found.

Graphics Programming Black Book © 2001 Michael Abrash