Why Divide by Z?

by Toshi (Toshihiro Horie)

Unraveling the Geometry Behind Perspective Projection

The first step toward 3D graphics in QB is to find out how to convert 3D points to screen coordinates. Graphics people call this process "perspective projection." In online tutorials, we see formulas like xs = x/z, ys = y/z without explanation. Why divide by z? Is it just an approximation? Or is there really a geometric reason behind it?

The first thing to do to figure out the answers to these questions is to draw a nice diagram. Imagine yourself looking down from the ceiling at your monitor and where you usually sit. Here is my little ASCII diagram to help you.

The 3D object, say a baseball is at point P, and it is displayed on the screen of the monitor at point S. The eye is at E, and the center of the screen is at point C. All points are defined so that the top left corner of the screen is the origin (0,0,0), and +y is down and +x is to the right and +z is into the monitor (up in the following diagram). The units for position are in SCREEN 13 pixels, since that's the screen mode the sample code will be working in.

[top-down view of screen, sliced at y=100]

In this figure,

Now you have to notice that we have two similar triangles:
Triangle ECS and Triangle EQP are similar triangles.

(In case you don't know what similar triangles are, they are triangles with the same shape but of different sizes**. They have the property that their corresponding sides are proportional, meaning they are magnified by the same amount, and thus the ratio between the corresponding sides is the same.)

** There's a special case when similar triangles have the same size as well, but those are usually called "congruent triangles."

This means that the ratio of the corresponding sides of the triangle is the same for ECS and EQP! Which means:

 EC         CS
----   =  ------        .... Eq. 1
 EQ         QP

Notice, that

Now we want to find out what xs is, because that's the x coordinate of the point we want to plot with PSET.

Substituting the values above into equation 1, we get:
(remember the distance between the eye and center of the screen is 640)

    640           xs-160
-----------  =  -----------   ... Eq. 2
(zp-zs)+640       xp-160

Now, if we assume the screen is at z=0, then zs drops out and things get easy.

    640           xs-160
-----------  =  -----------   ... Eq. 3
  zp+640          xp-160

[first figure with more numbers filled in]

We want to solve this for xs, so here it goes:
multiplying both sides by the (xp-160), we get

              640*(xp-160)
xs-160 = -----------------------        ... Eq. 3b
                 640+zp

adding 160 to both sides of the equation, we get

            640*(xp-160)
xs  = -----------------------  + 160    ... Eq. 4 (origin at top left corner of screen)
              640+zp

Next, we will find the formula for ys, then we can plot 3D points on the screen using PSET(xs,ys),colour.

How Come We Can Assume Y=100?

Okay, we got the formula for xs when y=100, but this same formula actually works for y<>100. Why is this? Here is an intuitive explanation:

<tek'> if i was standing on a cliff ...
<tek'> looking into oblivion
<tek'> and there's this giant orb that just floats
<tek'> say it's "30 units to the right of the center of my FOV"
<tek'> and it moves along the (vertical) y-axis
<tek'> no matter how far up or down it goes that x-coord is staying the same

A more difficult explanation:

The mathematical reason behind it has to do with projection again. Say y=120 (the 3D point is at xp,120,zp). The similar triangles formed by this point and the eye will match the one with y=100 if you project it to the y=100 plane.

Because y does not have to be 100, the formula for xs, given in equation 4 can be used any time we need to project 3D points to the screen.

This gives us a formula for xs. But what about ys? It turns out that ys can be found in almost the exact same way!

Now you can get off the ceiling :) Sit back in your seat, and rotate the monitor sideways so you can't see what's on the screen. Before you do that, you might want to copy the diagram below, so you can compare how the monitor looks to the diagram. Okay, since the screen is sideways, the +z axis points to the right in the diagram, and the +y axis points down. The baseball is now at point P' (pronounced "pee-prime") this time.

[side view of monitor and eye]

In this figure,

Now you have to notice that we have two similar triangles:
Triangle ECS' and Triangle EQ'P' are similar.

This means that the ratio of the corresponding sides of the triangle is the same for ECS' and EQ'P'! So we have:

 EC         CS'
----   =  ------          ... Eq. 5
 EQ'       Q'P'

Looks just like equation 3, huh? I told you that the x and y's can be solved in the same way!

The rest of the derivation looks similar too! Just keep the numbers straight, and you'll be fine. Plugging in the lengths of the sides of the triangle into equation 5, we get something that looks a lot like equation 2: (remember the distance between the eye and center of the screen is 640 pixels for SCREEN 13.)

    640             ys'-100
-------------  =  -----------          ... Eq. 6
 640+(zp-zs')       yp-100

Again, the screen is at z=0, so zs=0 and things get easier.

    640           ys'-100
-----------  =  -----------          ... Eq. 7
   640+zp         yp-100

We want to solve this for ys, so here it goes:
multiplying both sides by the (yp'-100), we get

              640*(yp-100)
ys'-100 = -----------------------        ... Eq. 7b
                 640+zp

adding 100 to both sides, we get

           640*(yp-100)
ys' =   ----------------  + 100         ... Eq. 8 (origin at top left corner of screen)
            640+zp

How Come We Can Assume X=160?

When we solve for ys, why can we forget about the x coordinate and assume it is 160? I can say, it works by analogy, but that's not a proof. Here is a physics-based explanation:

If I was standing on the side of a flat street looking toward the other side, while the cars were passing by in the x direction (horizontally), I wouldn't see the cars moving up and down, would I? [Now if this was a sloped street, cars going horizontally would be either taking off or crashing into the ground, like in "Back to the Future," but that's another story.]

Because of the above reasoning, once again, we can generalize our equation to one that projects any 3D point to the screen, without doing any extra work! So ys = ys' if point P' is at the same position as point P above.

                  640*(yp-100)
ys =  ys' =   -------------------  + 100      ... Eq. 8a (origin at top left corner of screen)
                    640+zp

Together, Equation 4 and equation 8a give us the complete formula for plotting 3D points (which have their origin on the top left corner of the screen, with +x axis going to the right, the +y axis pointing down, and the +z axis pointing into the monitor) onto the screen.

Here they are again.

         640*(xp-160)
xs  = -------------------  + 160      ... Eq. 4
           640+zp

          640*(yp-100)
ys  = -------------------  + 100      ... Eq. 8a
             640+zp

Wait! "Top left corner of the screen?" That means (1,-1,1) will be plotted off the screen! Ok, we'll fix this, but there's another problem for people used to y axis pointing up. The y-axis on our coordinate system points down!

To correct this, we have to return to equation 7b. (don't worry, it's only a small change!)

                640*(yp-100)
-(ys'-100)   = -------------        ... Eq. 7b [+y axis is up in 3D point, down on screen]
                  640+zp

Look, all we had to do was add a minus sign! Now this makes a small change in the equation 8 and 8a. Here it is:

             640*(yp-100)
ys  = 100 - ---------------      ... Eq. 8a [y axis fix]
                640+zp

We didn't have to change equation 4 because the screen coordinate (abbreviated "screen coord" below) agrees with the Cartesian coordinate system (defined by the x, y and z axes) we used.

[to make the origin of points at center of screen]
(Note: These xp and yp variables have values different from the xp and yp in Eq. 4 and 8a.)

               640*(xp+160-160)
xs  = 160 + ----------------------      ... Eq. 4c (origin at C)
                 640+zp                       [y axis fix, origin at C]

               640*(yp-100+100)
ys  =  100 - ---------------------     ... Eq. 8c (origin at C)
                    640+zp                    [y axis fix, origin at C]

Simplifying, we get a formula that works pretty well for plotting 3D points in SCREEN 13.

               640*xp
xs  =  160 + -----------   ... Eq. 4c' (origin at C)
               640+zp                [y axis fix, units in pixels]
                   
               640*yp
ys  =  100 - ----------    ... Eq. 8c' (origin at C)
               640+zp                [y axis fix]

[How things look with the origin at C (orthogonal projection)]

Likewise, we can move the orgin to the eye, if you want, although usually this isn't the always the best thing, because a point at the origin will crash your 3D engine (it's equivalent to poking yourself in the eye), unless you write an IF statement to handle the special case! (In fact, all points with z coordinates on or behind the eye shouldn't be displayed!) But this is actually what most 3D engines do (including OpenGL) when doing perspective transform.

(Note: LET xp3d = xp from Eq. 4c', yp3d = yp from Eq. 8c', zp3d = zp+640)

              640*xp3d
xs  = 160 + --------------   ... Eq. 4e' (origin at E, y-axis fix)
                zp3d
                   
              640*yp3d
ys  = 100 - --------------   ... Eq. 8e' (origin at E, y-axis fix)
                zp3d

Well, if we take a quick look at the xs = x/z, ys = y/z in the introduction, you'll see that 4e' and 8e' are very close. (just take off the centering addition and the *640 which multiplies the x and y by the eye to screen distance). To really get that, you have to measure everything in special units so that the distance from the eye to screen is defined to be 1, and use the coordinate system with the origin (0,0,0) at the eye and do WINDOW SCREEN (-160,100)-(160,100) to center the screen at (0,0,zs). Although that is nice in theory, when you write a game engine, you don't want to be doing extra divide operations, so the forms presented in equation 4e'+8e' or 4c'+8c' works the best. I suggest that you work out the math to prove to yourself that is true.

Well, we have derived several formulas for perspective projection in SCREEN 13, and we found out that the x/z and y/z are accurate ways to do perspective projection when we use the correct coordinate system and units. We will finish this time by writing a simple 3D parametric function plotter.

QBasic code (finally!)

DEFINT A-Z
SCREEN 13: CLS
'=====================================
'  3D Perspective Projection Test
'=====================================

'set grayscale palette
FOR i = 0 TO 255: OUT &H3C9, i \ 4: OUT &H3C9, i \ 4: OUT &H3C9, i \ 4: NEXT

'draw wavy thing around zp=100 axis
FOR t! = 0 TO 6 STEP .001
    xp = INT(100 * COS(t!))
    yp = INT(100 * SIN(8 * t!))
    zp = INT(99 * SIN(t!) + 100)

    zdenom = (zp + 640)
    'perspective projection (world space to screen space)
    IF zdenom > 0 THEN
        xs = (160 + xp * 640& \ zdenom) 'using equation 4c'.
        ys = (100 - yp * 640& \ zdenom) 'using equation 8c'.
        r = (640 \ zdenom)              'find size of point
        CIRCLE (xs, ys), r, 200 - zp    'plot it on the screen!
    END IF
NEXT t!

'draw helix around the y axis
FOR t! = 0 TO 60 STEP .001
    xp = INT(100 * COS(t!))
    yp = INT(t! + .5)
    zp = INT(100 * SIN(t!) + 100)
    xp3d = xp
    yp3d = yp
    zp3d = zp + 640

    'perspective projection (world space to screen space)
    'note how zdenom = zp3d
    IF zp3d > 0 THEN 'if point is in front of eye, then
        'project the 3D point to the screen
        xs = (160 + xp3d * 640& \ zp3d) 'using equation 4e'.
        ys = (100 - yp3d * 640& \ zp3d) 'using equation 8e'.
        r = (640 \ zp3d)                'find size of point
        CIRCLE (xs, ys), r, 200 - zp    'plot it on the screen!
    END IF
NEXT t!

Next time, I'll talk about how to change the field of view, so you can get panoramic scenes or binocular zoom vision in your perspective code.

Author:Toshi (Toshihiro Horie)
Email:horie@ocf.berkeley.edu
Website:http://www.ocf.berkeley.edu/~horie/project.html
Released:Unknown