World Class 3D Clipping
Introduction
I'm sure you're all aware of the advantages of clipping in 3D space. That was not the case a few years ago but gradually people have begun to realise that clipping in 3D space is both faster and easier. However, as with all things in this world, nothing is clear cut. There are a few decisions to make when attemping to clip in the most efficient way. Let's get some terms out of the way first so you all know what I'm talking about.
When we clip in 3D space I am talking about clipping to the view volume or view frustum. That represents a volume the eye or camera can see. It's defined by several clipping planes. Typically this is about 6 planes, one for the near plane (z-clipping), left, right, top, bottom planes and optionally a far plane. To define these planes we use a parameter known as field of vision (FOV). This angle represents the area we see. A typical field of vision is about 60 degrees.
After we have clipped our polygons to this volume we can project them to screen space without fear of any invalid z coordinates (behind viewer) and without any worries about our polygon spanning off screen. Yes folks, clipping in 3D means we don't have to clip in 2D!
With that said, let's discuss some methods in a pretty easy to read way.
Know your spaces
As mentioned, things are not clear cut. 3D engines can gain many speed boosts by performing operations early on, say in object space. After speaking to a couple of people on IRC it's clear that people get confused about spaces. It's understandable but let's help them along a bit.
Object Space
The space objects are usually defined in and has its own
local origin at (0,0,0).
World Space
The space object positions, camera positions, targets, Light
positions, etc, are defined in. Origin at (0,0,0).
Camera Space
The space that represents what the eye can see. The camera
position is the origin and represents (0,0,0).
Screen Space
The projected version of camera space to produce 2D
coordinates.
To get from one space to another we need to perform a transformation. So, to get from object space to world space we usually would transform the objects vertices by the objects tranformation matrix. Following on from that, to get from world space to camera space we would transform these transformed vertices by the camera matrix. Finally, we perform our projection on these vertices, be it perspective or parallel to get our 2D screen coordinates. Now we can plot the points! It follows that if we can move from one space to another via a single transformation we can go in the opposite direction using the inverse of that transformation. Also, we don't have to move through just one space at a time (object space to world space for example). We can move through several spaces. We just need a transformation that represents all the transforms put together. You all know and love matrix multiplication.
Hopefully that is a simple description but what this means is if we perform operations in object space we can maybe reduce the amount of vertices that need to be transformed. For example, if we perform back face culling in object space we could eliminate on average 50% of the polygons before it even gets to the clipping stage. Sounds goood huh? Well, what about doing 3D clipping in object space? That would mean we can clip polygons to the view volume and only transform those visible faces from object space which will give us a nice speed boost. Let's look at how we do it:
3D clipping in object space
We first need to define our view volume planes. Clipping in 3D means we clip polygons to planes. We have our field of vision angle we now define our view volume planes. The equation of a plane is given by:
Ax + By + Cz + D = 0
[A,B,C] is the normal vector to the plane.
[x,y,z] is any point in 3D space.
br> D is the distance the plane is from the origin.
To define a plane then we can use a normal vector and a distance from the origin. The equation tells us that any point that falls on this plane will then make the equation above evaluate to zero.
With that said, let's generate our planes' normal vectors with some simple trig.
t = FOV * 0.5; Y
| Z
Plane ³ Normal | /
--------Å---------------------------- | /
left ³ [cos(t), 0, sin(t)] | /
right ³ [-cos(t), 0, sin(t)] | /
top ³ [0, -cos(t), sin(t)] |/
bottom ³ [0, cos(t), sin(t)] +---------X
near ³ [0, 0, 1]
I hope you can see that these normals are defined in eye or camera space. We now need to transform them into object space. To do this we need a transformation that represents a move from object space through world space to camera space. This matrix is then inversed (transpose it). Now, transform the planes normal by this matrix. Now the planes are orientated in object space and we need a distance value the planes are from the eye. This is simple to calculate, take the dot product of the camera's position with the planes normal vector:
Plane Distance = Plane Normal DOT Camera Position
Do this for each view volume plane and we have a set of planes defining the view volume that are orientated in object space. We can now clip faces to these planes and we will have only what is visible to the eye. Remember, perform back face culling before clipping to eliminate back facing polygons to save some work.
Now we need to know how to clip a polygon to a plane.
Clipping a polygon to a plane
Remembering our plane equation from earlier we can easily determine if a point is on a plane. This is step one. The next step is to realise that a polygon is just a series of edges. An edge is a line. So, to clip a polygon to a plane we can take each edge of the polygon and calculate the intersection point that the line makes with the plane. Take a look at this diagram:
PLANE
| A Here a triangle is cutting a plane. The intersection point
| + is marked. The lines AB and BC intersect the plane and the
| /| line AC is fully inside and requires no clipping.
| / |
| / |
|/ |
x | +
/ | /|
/ | / |
B+ poly | / |
\ | ---> / |
\ | + |
x | | |
|\ | | |
| \ | | | The final clipped triangle. Results in an n-gon.
| \ | | |
| \| | |
| + | |
| C + |
| \ |
| \ |
\ |
\|
+
Okay, let's start with edge AB. We first must decide if it does intersect with the plane. To do this we simply check point A and point B against the plane to see what sides they fall on. To do this we perform a dot product with the point and the planes normal and check it against the distance the plane is from the origin.
Let point A be [Ax, Ay, Az]
Let point B be [Bx, By, Bz]
Let the plane normal be [Nx,Ny,Nz]
Let the planes distance from origin be PD
dA = Ax*Nx + Ay*Ny + Az*Nz
dB = Bx*Nx + By*Ny + Bz*Nz
Our conditions are:
dA_inside = (dA >= PD);
dB_inside = (dB >= PD);
Intersection = (dA_inside != dB_inside);
If dA_inside is true then point A is inside the plane.
If dB_inside is true then point B is inside the plane.
If Intersection is true then it shows point A and point B lie on opposite
sides of the plane and hence intersect it. Thus, it requires clipping.
In our case point A would be inside the plane and point B would be outside the plane.
Now, we have decided AB intersects the plane. We need to calculate the intersection point. What do these values 'dA' and 'dB' represent? Well, it's the distance along the planes normal the point is from the plane. Did you understand that? It's the DISTANCE the point is from the plane.
Since we have found the distances the points A and B are from the plane we can use the ratio to calculate a scaler that will indicate where along the line AB the plane equation equals zero.
Our scaler then can be calculated like this:
Scaler = (PD - dA) / (dB - dA)
Remember, we know the planes distance value, PD, because we calculated it when we transformed our view volume planes from camera space to object space.
Finally, to use the scaler value to calculate the intersection point we do:
Ix = Ax + (Bx - Ax)*Scaler
Iy = Ay + (By - Ay)*Scaler
Iz = Az + (Bz - Az)*Scaler
Well done, we have clipped a line to a plane. Putting it all together means we need to loop through our polygons vertices. Check each edge against the plane. If it is fully within the plane we keep the edge, if it intersects the plane we store the new edge given by the point inside the plane and the intersection point for this edge and if the edge is fully outside the plane we throw the edge away. Implementation of this means we need a source polygon and a place to store the new clipped polygon. We read edges from the source polygon and based on what I just said add edges to the destination polygon. As soon as we have done all edges, (AB, BC, CA), in the source polygon our destination polygon will result in a perfectly clipped polygon. Horray.
The pro's and con's
It's all been pretty simple. We now have fully clipped polygons and we haven't had to transform them at all to get there. The trouble however is the dot product is required to test if a point is inside or outside the plane. On newer processors this won't be much of a problem if they have instructions dedicated for this. However, currently, dot products are not that cheap. They are quite quick but the amount of times we use it soon starts to have impact on our performance. So on one hand we have saved ourselves some time transforming non-visible faces and on the other we have expensive tests to perform clipping. Let me point out that most faces won't require clipping. Only a few faces in your object will require clipping. From this you can see that the bottleneck is the test for clipping, not the actual clipping itself. A way to improve this would be pretty neat. Let's now take a look at the possability of using a different space in which to clip our faces. Let's move to camera space.
BOOOOOOOOOOOM!
We are in camera space. You've transformed your objects into this space and now we wish to clip. Well, we could do the same process as above but save transforming our planes normals since we defined them in camera space to begin with. That's not saved much and rather pointless. So, what can we do? One trick is to use a fixed field of vision of 90 degrees. How can this help? Well, making the field of vision 90 degrees makes our planes' orientation rather special. Look at this diagram.
left plane right plane
\ / Z
\ / |
\ | / |
\ | / |
\ | / |
\ | / |
\ |t / |
\ | / +--------->x
\|/ t = 45degrees
eye
This is a 90 degree field of view. It might not look like it with my ascii art but trust me. The eye is at the origin, (0,0,0). If we move forward in the z-axis direction by 1 the x value that falls on the right plane is 1 and similary, it's -1 on the left plane. Move forward 10 on the z-axis and on the right plane, x = 10 and -10 for the left plane. You can think of it as the planes span from the origin at 45 degrees to the eye and the gradient of the plane is 1 or -1. Hence a step of 1 in Z results in a step of 1 or -1 on the right and left plane respectively.
So, if we know our Z coordinate (which we always do obviously) we know the X coordinate where the intersection occurs at. This is the same for the top and bottom planes. Except it's Y and not X in that case. Okay. This gives us some clipping limits:
Plane ³ Normal
--------Å----------------------------
left ³ X ==-Z
right ³ X == Z
top ³ Y == Z
bottom ³ Y ==-Z
How do we check if a point is in the view volume? We use a simple compare! That's a damn site faster than a dot product.
X < -Z == Point outside left plane
X > Z == Point outside right plane
Y < -Z == Point outside bottom plane
Y > Z == Point outside top plane
Ahhh very nice. So, replace the dot product check with a compare and we have now got a very fast way to determine if points fall in or out of view. Now, you're saying great. We can apply this to our clipping example from earlier and have determined edge AB requires clipping. How do we calculate the intersection point. Previously, we used the result of the dot product as distances and generated a scaler value. Let's take a look at how we calculate the scaler value now:
Let's take the left plane for example. We know that the X value on the plane is equal to -Z value. Our equation for a line is:
x = x1 + t(x2-x1)
y = y1 + t(y2-y1)
z = z1 + t(z2-z1)
t is a value in the range 0..1 and tells us where about on the line we are, i.e., it's our scaler value from earlier.
Let's find t:
x1 + t(dx) = -(z1 + t(dz)) ( x == -z on plane remember)
= -z1 - t(dz)
x1 + z1 = -t(dz) - t(dx)
x1 + z1 = t(-dz - dx)
t = (x1 + z1) / (-dz - dx)
So, our clipping function for the left plane is:
dx = x2 - x1
dz = z2 - z1
scalar = (x1 + z1) / (-dz - dx)
So, to calculate the intersection point for the left plane we have:
dx = Bx - Ax
dz = Bz - Az
Scaler = (Ax + Az) / (-dz - dx)
Ix = Ax + (Bx - Ax)*Scaler
Iy = Ay + (By - Ay)*Scaler
Iz = Az + (Bz - Az)*Scaler
Tada!! Simple or what. Now, do the same thing for right, top and bottom and you have these formulas:
dx = x2 - x1
dy = y2 - y1
dz = z2 - z1
Plane ³ Scaler
--------Å----------------------------
left ³ (Ax + Az) / (-dz - dx)
right ³ (Ax - Az) / (-dx + dz)
top ³ (Ay - Az) / (-dy + dz)
bottom ³ (Ay + Az) / (-dz - dy)
You can see that the equations are pretty simple. So, let's sum up. Using a 90 degree FOV and clipping in camera space means we can quickly determine what points are within the view volume with just compares. We can perform intersection calculations very easily using the equations above for each of the planes and we can put this together with our basic clipping algorithm mentioned previously to perform very fast clipping.
No more 2D clipping
We have transformed objects to camera space, clipped them, now we project them and render them. What's wrong with this? Screen boundaries. We don't want to render outside our screen limits and we don't want to put any 2D clipping in. We want to perform it with the 3D clipping function. It's very simple to do, you just need to look a little at the perspective projection formula. Our perspective projection formula looks typically like this:
x' = xd / z
y' = yd / z
(x',y') are screen coordinates
(x,y,z) are 3d camera space coordinates
d is the distance the screen is from viewer
The important value here ladies and gentlemen is the value 'd'. Let me draw that all-singing, all-dancing 90 degree FOV view volume for you once more:
left plane right plane
\ /
\ /
\ | /
+-----+-----+ <- width of screen
\ | /
\ |d /
\ | /
\ | /
\|/
eye
To automatically clip to the screen we simply make it so that d is the correct distance that the screen limits fit exactly on the view planes.
So, to calculate d we need a little bit of trig as follows:
d = (screen_x_resolution / 2) / tan(fov / 2)
Great. One snag, since when is the screen's x resolution the same as its y resolution? No problem, let's calculate a d value for the y resolution also:
dx = (screen_x_resolution / 2) / tan(fov / 2)
dy = (screen_y_resolution / 2) / tan(fov / 2)
Have you ever heard of an aspect ratio? Most video modes are not 1:1 aspect ratio, which means you don't get nice square pixels. So, a circle drawn on a 1:1 aspect ratio will look like an elipse on a non 1:1 aspect ratio, i.e., a 320x200 screen. We all know and love xmode and remember the days of the most famous video mode 320x240. People used that because it provided this magic 1:1 aspect ratio. How can we get a 1:1 aspect ratio on any resolution is a good question. Simple, we scale the y component during projection by some value. The value is calculated like this:
240 / 320 = 0.75 (ratio of y with x for a known 1:1 aspect ratio)
0.75 * (screen_x_resolution / screen_y_resolution) = aspect ratio
From this we now have:
dx = (screen_x_resolution / 2) / tan(fov / 2)
dy = ((screen_y_resolution * aspect ratio) / screen_x_resolution) * dx
Now everything is correct and our 3D clipper will be clipping to screen boundaries also.
NB: Calculate the dx,dy and aspect ratios at initalisation or whenever the field of view changes.
Axial-aligned bounding boxes
We all know what a bounding box is. It's a box that fits as tightly as a box can around an object. It's for a rough estimation of the object for use as a crude early out method of culling and other such things. You can use spheres or boxes or both. Let's concern ourselves with just boxes here. An axial-aligned bounding box is basically a box aligned with your x, y and z axis vectors. To calculate an axial-aligned bounding box you take the objects vertices in whatever space it's defined (normally object space) and find the maximum and minimum x, y and z values of its vertices. With this you have 6 values.
MinX, MaxX
MinY, MaxY
MinZ, MaxZ
You can construct the 8 vertices of the box from these. We can help speed up our 3D engine by saying, if this box is totally outside view volume ignore the object completely, if this box is totally inside view volume then accept the object and don't even perform a clipping test on it. Just the normal backface culling and that's all. If this box is partially inside and outside then the object will require testing for clipping. The thing is, an axial aligned bounding box can be tested against the view volume when we use our fast 90 degree FOV very easily. All you need to do is this:
// Is object totally outside view?
if(MaxX < -MaxZ)
return OUTSIDE_VIEW;
if(MinX > MaxZ)
return OUTSIDE_VIEW;
if(MaxY < -MaxZ)
return OUTSIDE_VIEW;
if(MinY > MaxZ)
return OUTSIDE_VIEW;
// Is object partially inside view?
if(MinX < -MinZ)
return CLIP_TIME_FOLKS;
if(maxx > MinZ)
return CLIP_TIME_FOLKS;
if(miny < -MinZ)
return CLIP_TIME_FOLKS;
if(maxy > MinZ)
return CLIP_TIME_FOLKS;
However, our axial-aligned bounding box is defined in object space, we are in camera space. If we transform the bounding box by the transform matrix to get it in camera space it's no longer axial-aligned. We need then to regenerate our Min and Max values. No problem. Here is a simple way to do it:
TransformVector(bounding_box_vertices[0], xform_matrix);
MinX = MaxX = bounding_box_vertices[0].X;
MinY = MaxY = bounding_box_vertices[0].Y;
MinZ = MaxZ = bounding_box_vertices[0].Z;
for(i=1; i<8; i++) {
TransformVector(bounding_box_vertices[i], xform_matrix);
if(bounding_box_vertices[i].X > MaxX) MaxX = bounding_box_vertices[i].X;
if(bounding_box_vertices[i].X < MinX) MinX = bounding_box_vertices[i].X;
if(bounding_box_vertices[i].Y > MaxY) MaxY = bounding_box_vertices[i].Y;
if(bounding_box_vertices[i].Y < MinY) MinY = bounding_box_vertices[i].Y;
if(bounding_box_vertices[i].Z > MaxZ) MaxZ = bounding_box_vertices[i].Z;
if(bounding_box_vertices[i].Z < MinZ) MinZ = bounding_box_vertices[i].Z;
}
Now we can perform the tests above. Another way to regenerate an axial-aligned bounding box is given in Graphics Gems I.
Use spheres also, it's just a single check. A non axial-aligned box may fit better, just play around till you have what suits you.
Any FOV will do
Hrm, 90 degrees for a field of view is fine for most things. However, maybe for some effects or because you have a nice 3DS mesh being imported and it contains the infamous FOV track or whatever, a 90 degree FOV is not available to you. Can you perform all this fast clipping with any old FOV? The answer is quite simply YES! The idea is to warp your space to an effective 90 degree FOV. Many people have difficulties to understand this warping but all it boils down to is a simple scale of the X and Y components of your vertices.
Okay, you know I just said that warping your FOV so it becomes effectivly 90 degrees, yet in reality it is still the normal FOV just that everything is scaled occoringly. This is done by scaling the X and Y components of your points. Well, let's look at the matrix for performing this:
³ d/h 0 0 0 ³
³ 0 d/h 0 0 ³ = Perspective Matrix
³ 0 0 1 0 ³
³ 0 0 0 1 ³
d is the distance from the screen, we calculated to compensate for aspect ratio and x and y resolutions. We called them dx and dy. The value h is the height of the view plane. Now, we use our screen as the view plane so h is the screen dimensions. Let's rewrite this then:
hx = screen_x_resolution / 2
hy = screen_y_resolution / 2
³ dx/hx 0 0 0 ³
³ 0 dy/hy 0 0 ³ = Perspective Matrix
³ 0 0 1 0 ³
³ 0 0 0 1 ³
This is the warp matrix. If you take your transformation matrix that takes an object from object space through to camera space then multiply it with this perspective matrix then now the transform will include the warping.
That's it!
Now you have easy 3D clipping with no problems using any FOV. No sign of homogeneous coordinates at all.
Optimizations
A few obvious optimises for you:
- Use a bounding box at an early stage to eliminate totally hidden objects.
- If object is totally within view, flag object and don't bother with a 3D clipping test.
- Perform backface culling in object space before clipping and only transform visible faces.
- Flag any vertices/faces that require clipping and what planes they need clipping too. This will avoid any unnessecary checks, xforms and projections later on.
- Oh yes, code it in assembly language. Not that this is really needed but hell, I would.
- Don't bother using znear clipping. The left, right, top, bottom planes will handle it for you.
Need I say anymore? The rest is left to you.
Final words
Hope you enjoyed our visit to the world of clipping. A few greets are in order.
Altair (da kinga)
Absent
Masterboy
Faktorii
Kodebotti
RocketMagnet, Semtex, Grimace, JellyBaby, the leicester crew! :)
#coders
#3dcoders
#ukscene
and to all the people I trash at tetrinet (hahaha).
Bye!
frenzy
p.adams@wlv.ac.uk
"...code is art..."