3D Visibility Pipeline for dynamic worlds

Preface

The purpose of a visibility pipeline is to compute the visible part of your 3d world and to discard the non-visible polygons.

The purpose of a good visibility pipeline is to compute this visible set before doing any other thing (transform, lighting, etc.) and maybe to throw in some z-sorting, material sorting and other things to speed up the rasterization process...

The purpose of this tutorial is to explain an imho good visibility pipeline implementation, but it's better if U already know the basics of 3d math (vectors, matrices, normals), triangle rasterization (z/w-buffering), world-object-camera-screen spaces, frustum culling and clipping (I'll explain frustum planes briefly).

In static worlds (like Quake...) we can play with some specific data structures that hold polygons in some spatial ordered fashion, and use the properties of that structures to do visibility computations (BSP trees, octrees, KD-trees, portals, etc.). As a plus, with some of those structs we can easily compute a front-to-back ordering; that's always nice to avoid overdrawing with a s-, c-, w- or z-buffer (aargh! Well, there are other more nicely named methods too by the way... :P). But this approach isn't doable in dynamic scenes, as those kind of structures are often pre-computed and they take a lot of time to be rebuilt.

I'm not going to say anything new, and I don't claim I would be doing so... I'm just going to explain some well-known algorithms and their weak/strong sides.

Don't flame me too much for errors, this document was written in a long time due to university and idleness, and I'm too lame to proofread it.

Hierarchical bounding volumes culling

In this first step we will discard all the objects that aren't in our view frustum at lightning speed. The view frustum is a truncated pyramid defined by 6 planes that bounds the visible geometry.

Warning: ascii-art ahead.

         \             /
        --\-----------/-- far plane
           \         /                   *Top view*
left plane  \       / right plane
             \     /
            --\---/-- near plane
               \ /
                * camera
                |

There's also a top and a bottom plane that aren't visible in this top view...

Every plane is defined by a normal (facing inside the view volume) and a distance from the origin. Now to do our culling we first compute a set of bounding volumes around every object. You can use spheres, bounding boxes, axis aligned bounding boxes or other kind of cheap geometry, depending on the object we're bounding (you can even think of using a low-resolution polygonal mesh that bounds your triangle mesh more exactly if you think it could be good for you; it depends on the objects you're using in your engine). I currently use bounding spheres, but it's wise to make an engine that can use every kind of bounding volume. For spheres you only have to keep a position and a radius. Every point of the object MUST be inside the bounding sphere... To find the minimal bounding sphere for a given object use this simple algorithm:

1. Compute the distance between all points in the object and keep only the bigger one.
2. Place the b-sphere center in the middle between the two most-distant points.
3. The b-sphere radius is equal to distance/2.

Well, now let's do it! We won't work in object space this time, we'll use some tricks in camera space instead. Why? Because transforming the bounding sphere in camera space is actually faster than transforming the frustum view in object space, and we can use some other neat tricks later. To keep the bounding volumes and the geometry I use a special hierarchical object structure:

Object structure:
Geometrical data. Points, faces, normals, transformed points, normals, 2d projected points, colours, object position, scale, rotation etc.
Array of pointers to children. Every child inherits the parent's transform.
Array of pointers to 'visual' children. Every child inherits the parent's transform and clipping flags and must stay inside the parent's bounding volume.
Array of bounding volumes (at least one).
Clipping flags.

As you can see I can have multiple bounding volumes for a single object as well as I can do other neat things. For example for a complex object you can use a visual parent with no geom. data and with a big bounding sphere attached. If this big bounding sphere is all outside (resp. all inside) the view frustum we discard it (resp. keep it) and all the visual parents, otherwise we compute the bounding volumes of the visual child. Another method is splitting a big object in many sub-objects and assigning a small bounding sphere to it. This method is better because using multiple bounding volumes in camera space is slow (as you have to transform every b.vol). Using multiple "fake" parents is slow too (you always have to transform every parent's bounding volume) but you have the advantage of discarding some parts of the object (the object is subdivided in many sub-objects). If a bounding volume crosses some view frustum planes, we keep track of what planes it crossed, so we can check object geometry only with those planes in the clipping stage, and we'll also check the visual child's bounding volumes only against the parent's crossed planes! This is cool, and you can save unit planes transformation, too. Clipping is done later in object space, so U have to transform frustum planes. If U think it's better to do everything in object space (transform all view frustum planes and do the bounding volume and the clipping, all with this transformed frustum) think of this:

1. Most objects are all inside or all outside so you won't use clipping at all, and a camera space bounding sphere check is faster than an object space one (so you discard completely out-of-view objects very fast).

2. As said before, you only have to transform (and check) some of the view planes, so this is almost as fast as the object space method if clipping is needed.

3. If you are using 3d hardware, you can let the video card do the clipping.

4. Some coders prefer to do clipping in 2d screen space, just before rasterization (some use a 2d clipping rasterizer, a.k.a. edge span clipping, but that's lame...) because it's more precise.

So first you have to transform the bounding sphere. I also take care of scaling here, but not of non-uniform scaling or of skewing so be prepared... (non-uniform scale is easy, skewing requires a bit of thinking). I dunno if that's the fastest method to do that, but it's the only one I could think of:

1. Transform b-sphere position with the object->camera space matrix.
2. Recompute b-sphere radius:
1) Copy the b-sphere untransformed position to a new vector.

2) Add the b-sphere radius to a component (for example the x-component) of this vector.

3) Transform the vector.

4) The b-sphere transformed radius is equal to the distance between this vector and the transformed b-sphere position vector.

Now we have to check if the transformed b-sphere is inside or outside our bounding volume and set the proper clipping flags. Let me explain the standard frustum plane check method, and then forget it, because we're not using it... First you have to set up the plane normals properly, and I won't explain this here, as it's easy to do and if you can't find out the right equations you can find this info in every frustum clipping tutorial by the way (check out Submissive^Cubic&Seen's one for example). To check if a point is inside or outside a frustum plane we use the following formula:

distance = (point DOT plane_normal) - plane_distance;

If distance is strictly greater than zero, the point is inside the frustum plane. So for a bounding sphere we have to check if the bounding sphere center is inside the frustum planes; if so, the object is either all-inside or crossing some planes. To check for crossing, we have to check if center + radius is inside the right, bottom and far planes and if center - radius is inside the left, top and near planes. If a point falls outside a plane, the object is crossing that plane. All this stuff (and the previous one, too by the way) assumes that we're using the following coordinate system:

Warning: ascii-art ahead.

          |    Z
              /
          |  /
            /
          |/
 - - - - -*--------X
        / |
          |
      /   |
          |
    /     Y

Now the good method... First do a "standard" check with the far and near planes... Now you have to check the top/bottom left/right planes. To do this fast, I use outcode generation with a fixed FOV of 90 degrees, and play with outcode bits. What's that all about? Well, if our camera has a FOV of 90 degrees, the top-bottom-left-right frustum planes check simplifies to:

If point.x < -point.z: The point is outside the left plane.
If point.x > point.z: The point is outside the right plane.
If point.y > -point.z: The point is outside the top plane.
If point.y < point.z: The point is outside the bottom plane.

Those checks can be done with a little integer arithmetic if we're storing vertex coordinates in 32 bit floating point numbers. I won't explain here why, because I think that Frenzy's outcode generation tutorial is jolly good (Frenzy also wrote a good 3d frustum clipping tutorial). I'll give U here only some code not to let you cut & paste without understanding anything, but to let you see how easy this method is:

bs_flag=0;
temp_d=center.z-radius;
temp_z=(*(long int*)&temp_d)&0x7FFFFFFF;

temp_d=center.x-radius;
temp_x=(*(long int*)&temp_d);
bs_flag|=((temp_z-(temp_p&0x7FFFFFFF))>>31) << ((temp_p>>31)+0);
temp_d=center.y-radius;
temp_x=(*(long int*)&temp_d);
bs_flag|=((temp_z-(temp_p&0x7FFFFFFF))>>31) << ((temp_p>>31)+2);

temp_d=center.x+radius;
temp_y=(*(long int*)&temp_d);
bs_flag|=((temp_z-(temp_p&0x7FFFFFFF))>>31) << ((temp_p>>31)+4);
temp_d=center.y+radius;
temp_y=(*(long int*)&temp_d);
bs_flag|=((temp_z-(temp_p&0x7FFFFFFF))>>31) << ((temp_p>>31)+6);

This should generate 8 flags in bs_flag as following:

 01234567    bits
 ----++++    point +/- radius
 RLBTRLBT    frustum planes

If a point is completely inside the view frustum bs_flag is equal to zero. Example:

Warning: ascii-art ahead.

  ------------------ 
 |                  |
 |                  |
 |            a     |
 |            |     |
 |         b--*--d  |
 |            |     |
 |            c     |
 |                  |
  ------------------
  
                    CHECK                   |  FLAGS
  a = point.y - radius is inside BT planes  | xx00xxxx
  b = point.x - radius is inside RL planes  | 00xxxxxx
  c = point.y + radius is inside BT planes  | xxxxxx00
  d = point.x + radius is inside RL planes  | xxxx00xx
                                             ----------
                                              00000000 = all inside

Otherwise we have to generate the correct plane crossing flags:

bs_flag=(bs_flag&15)^(bs_flag>>4);
Now bs_flag contains only 4 flags and indicates if R/L/B/T planes are crossed.
if(bs_flag!=0) the object is completely outside our frustum view.

Warning: ascii-art ahead.

  ------------------ 
 |                  |
 |                  |
 |                  |
 |                a |
 |                | |
 |             b--*--d
 |                | |
 |                c |
  ------------------
  
                    CHECK                   |  FLAGS
  a = point.y - radius is inside BT planes  | xx00xxxx
  b = point.x - radius is inside RL planes  | 00xxxxxx
  c = point.y + radius is inside BT planes  | xxxxxx00
  d = point.x + radius is outside R plane   | xxxx10xx
                                             ----------
                                              00001000 = plane crossing flags
                                                         computation is needed!
  crossing flags=1000 R plane is crossed

Backface culling

Another thing we would like to do before clipping polygons to screen boundaries is to remove the backfaces (in non-transparent objects of course) as those will never be displayed. Doing so we will discard about 50% of the polys in the scene. If you're storing all the polygon normals that's an easy task to do after the transform stage... You simply transform all the normals with your object/camera matrix and check if this normal faces forward or backward (by a simple check on the transformed normal z component sign). But this is after the transform! As we want a good visibility pipeline we have to search for a method that does this before the transform stage. Well, it's very simple, we only have to transform the camera in the object space... To do that we have to find the inverse of the matrix that transforms the object's points from world space to camera space, and apply it to the camera. As we're only interested in the orientation of the camera, we can discard all the translations and only care about rotations (well, if your object->camera matrix also contains some resizing, skewing etc., it won't work, but you should always have a separate 3x3 matrix that only keeps track of rotations to transform an object's normals, so use that one instead). To find the inverse of a rotation matrix simply transpose it (as it's an orthogonal matrix...) et voilà... Apply this matrix to the camera normal and compute the dot-product between the transformed camera normal and the normal of every object face. If this product is > 0 then discard the face. To speed the thing up you can write a function that multiplies a 3x3 matrix with a vector in transposed order... Also, to discard a face, don't use a Boolean is_discarded variable, use a frame counter instead. If the face is visible, update the face's frame counter with the current frame, otherwise keep it unchanged.

A good trick to speed up BF culling is to split the object in groups of triangles with +/- the same normal (you have to choose a threshold, then check if the angle between two normals is smaller than this value). Every group has a normal (it can be the average of all normals or just the normal of one of the triangles in the group), so U can easily discard groups of backfacing triangles in your object by checking the angle between viewpoint and the normal:

1. If the angle is >180-threshold the group is backfacing.
2. If the angle is <180+threshold the group if frontfacing.
3. Otherwise you have to check every triangle in the group.

Frustum culling

Now we can perform a simple frustum culling at triangle-level to discard a few more triangles in the object. Just like bounding-sphere frustum culling, but this time we will test the triangles. If all the vertices of a triangle are inside the frustum then the triangle is all-inside, if they are all outside, then the triangle is all-outside, otherwise the triangle will need clipping later... Of course we check every vertex only against the planes that were intersected by the bounding volume (if any). And of course you don't need this part if you're using hardware rasterizers (as 3d video cards perform 2d culling clipping very quickly)!

The rendering stack and object sorting

To traverse the scene tree you can use a recursive rendering function like this one (pseudocode):

void render(object *obj,world *w,matrix *prevtrans,matrix *prevtnorm,flags prevbflags)
{ matrix current_trans,current_tnorm;

  If (prevtrans==NULL)
  { prevtrans=identity;
    prevtnorm=identity;
  }

  Compute transformation matrix for current object;
  Compose prevtrans with current_trans matrix;
  Compute normal transformation matrix for current object (only rotations, 3x3 matrix);
  Compose prevtnorm with current_tnorm matrix;
  
  Check bounding volumes
  If(visible)
   If(not_clipped)
   { For every point
      Transform points+project;
     For every triangle
     { Compute lighting;
       Rasterize;
     }
   }
   else
   { For every triangle
     { Clip;
       Transform;
       Project;
       Rasterize;
     }
   }

  If object has visual children
   render(obj->childs,w,current_trans,current_tnorm,current_flags);
  If object has children
   render(obj->childs,w,current_trans,current_tnorm,-1);
}

int main()
{ Init system;

  Rendering loop
  { Update world;
    render(w->root,w,NULL,NULL,-1);
    Blit buffer;
  }
}

But this function has some drawbacks... If U don't declare local variables as static, they will be pushed onto the stack and this will waste a lot of memory! If U declare local variables as static (or at least the bigger ones) they will be far from the current stack frame, and this is a little bad cache-wise because the stack usually is already in the cache, but the worst thing is that some compilers won't use registers for those variables and always read/write to memory. The best thing to do imho is to precalculate a rendering stack before rendering traversing the object tree, and make the render function iterative. In the render stack precalc function you can also put the bounding volume calculation. Also remember to only keep a pointer to the previous stack element and not to push trans and tnorm in the render stack precalc function. The drawback is that you'll come up with a big stack, as it holds all current objects while the recursive rendering function will generate a stack with a maximum size of the tree depth. The good thing is that you can later sort the stack very easily (if you implement it with a double-linked list with dumb top and bottom nodes), and have a rough front to back polygon sorting (so you'll reduce overdrawing). If U want a more precise sorting you can also sort triangles in the single objects or you can add a flag to complex objects that have to be polygon-sorted (only meaningful with software rasterizer to reduce overdraw). If you're rendering in hardware you can sort the object material-wise so you'll reduce state changes (which are expensive, especially with older 3d hardware). For example, you could generate the following key for every object and sort with that key (we're rendering in hardware):

             most significant --- less significant
BITS           01234567890123456789012345678901
               ----------*********+++++########
                   T1       T2      MF    ZV

T1, 10bits, first texture index (max. 1024 textures in the world)
T2, 9bits, optional second texture index (max. 512 secondary textures in the world)
MF, 5bits, material flags (smooth/flat, lit/unlit, vertex alpha on/off, texture1 on/off, texture2 on/off)
ZV, 8bits, first 8bits of fixed point representation of the z (or 1/z) value

Remember that sorting textures is ALWAYS useful because you'll reduce memory cache misses...

Also if you want to use beamtrees for fine occlusion culling you can use the first few objects in the sorted stack as occluders (if their bounding sphere is bigger than a threshold value...). So use a rendering stack!

Octrees and KD-trees for dynamic worlds

Have I said that octrees were only for static environments? Well, I was lying... :)

Actually octrees are good for some dynamic environments too (erhm, and for many other things like collision detection, color quantization etc... - they're just a data structure), but they aren't good for every dynamic environment... It depends on what you have to render, and this is true also for the two occlusion culling algorithms below (if U don't have big occluders it isn't useful to do occlusion culling...). Well, let's start with standard octrees and then I'll explain you how to make an octree for a dynamic world. Standard octrees are a simple but powerful space partitioning system that simplifies visibility determination and gives you an easy way to do a rough front-to-back scene traversal. It's really easy to build an octree, simply start with your world and surround it with an axis-aligned cube (some implementations use boxes instead of cubes); this will be the octree root node. Then subdivide it in eight subcubes (children of the root node) by splitting the main cube with the three axis planes (x, y, z) passing from the cube center (again some implementations let you split against planes passing from an arbitrary point in the cube) and reiterate this splitting process until a condition is met. Usually this condition is a threshold on the polygons that are in a node (if it's more than the threshold value, continue splitting) or on the cube size. KD-trees are very similar to octrees but they are binary trees (every node is split in two subnodes by using alternatively the x, y and z planes passing from an arbitrary point... for example you can split a node in two subnodes that contain roughly the same number of polygons), so we will talk only about octrees now. The only problem that you may have occurs when a polygon is in the middle between two octree nodes. Then you have two choices:

1. Clip the polygon against octree nodes.

2. Insert the polygon in both nodes, and add a flag (a frame-counter like the one that we use for backface culling but not the same one!) to polygons so that they are drawn only once per frame.

Now you can use your octree for two things:

1. Front-to-back traversal of octree nodes
2. Easy determination of visible nodes

To do the visibility culling, you have to simply check nodes against frustum volume in object space (just a normal hierarchical bounding volume culling as explained before). The front-to-back traversal of an octree is very easy, just think of an octree node as a binary space partition tree like that:

              Zplane
                / \
          Yplane   Yplane
         / \           / \
    Xplane Xplane Xplane Xplane

Now we "only" have to find a method for using octrees for a dynamic environment... Well, the best one imho is to use "loose" octrees with bounding spheres (or axis-aligned bounding boxes). You have to pre-allocate an equally divided octree (stop at a fixed recursion depth), and it's also wise to add an object counter to every octree node that counts the number of objects in the node and in its subnodes (this is because many nodes will be empty, and if so, we can avoid checking them for visibility). Then for every object that you want to insert in the octree simply calculate the bounding sphere size and you'll know in what level of the tree the object will be inserted (you have to find the deepest level that has nodes big enough to contain the sphere, it's simple because every level has nodes of equal size and the size of nodes in a certain level is equal to half the size of the nodes in the parent level), then simply put the object in the node that contains its bounding sphere center (this is easy to find, too) and update the object counters (of the current node and of its parents). As you can imagine this octree structure is "looser" than standard ones because every node can contain geometry a little outside its bounding box, but it's still an octree. If you move objects around it's quite easy to update the tree by removing and re-adding the object if its center crosses its node's boundaries (and this can be done even faster if you keep pointers to neighbors in every node of the octree).

Occlusion culling via beamtrees

Before starting this paragraph you should think if occlusion culling is actually useful in your engine... In a space combat game there won't be many occlusions, while in first-person game with an outdoor town scene with lots of big houses and small objects like trees and plants we're going to cull out most of the objects with occlusion culling... But for static scenes it may be better to think of some static data structure to perform visibility calculations. Also in hardware overdraw it isn't really a problem, and we're going to perform our occlusion culling in camera space, so if you want occlusion culling you will have to transform points before clipping and perform clipping in camera space, too, while without it we would do clipping in object space and save some other transformation. To summarize the things:

Good/Bad

B: You have to do everything in camera space, so you'll need to transform more points than actually needed in the clipping process.

G: If you are not using 3d hardware, you gain performance by discarding many polygons if you're engine has to render scenes with big occluders.

G: If you are using 3d hardware, polygon scan conversion and w/z-buffering is faaaaaaaast.

G: It's independent from the resolution (not like w/z/s/c buffering).

B: Coverage buffer is usually faster.

B: So you'll transform points of the occluded polygons, too, and that's a shame.

G: You'll save occluded polygons color/lighting/projection calculations.

B: If your scene is static, maybe is better to use some other visibility method. For indoor environments the portals approach is much better.

B: You have to sort your object in a front-to-back fashion (but if it's useful for your scene you can re-sort objects later material-wise).

To say the truth a strict implementation of beamtrees generates a perfectly visible set, by doing all the culling/clipping and occlusion culling/clipping process... This is easy to do, too, but it's really slow (it could be useful for other purposes, like real-time shadows, but for that it's better to use shadowmaps imho) and requires a perfect front-to-back polygon ordering. So we won't do that; instead, we'll do a faster beamtree check by using only some polygons as occluders and by not clipping partially occluded polygons (so we still need some form of fine occlusion culling, usually we'll use the standard z/w-buffer). But before that let me explain the standard beamtree algorithm. The beamtree is like a binary search tree (or you can view it as a special BSP tree). Remember that all the polygons in the scene are z-sorted and that we're in camera space (we have already transformed all the points). The beamtree is a binary tree, every node of the plane holds a plane passing from the origin (so we only have to keep the normal of the plane, as the distance from the origin is always zero, the origin of camera space is the camera of course, so all the planes are passing from it). Let's start with the tree filled with the frustum planes in the following order:

    left
     / \
  null  right
         / \
     null  top
           / \
       null  bottom
              / \
          null   null

It's a degenerated tree. Now pass every polygon (in front-to-back order) through the tree, start from the root node and go to the right child if the triangle is inside it, to the left one if it's outside. If it's crossing a plane. first clip it against the plane, then continue the tree traversal until you reach a leaf. If you've reached a leaf which is the right child of it's parent, render the clipped polygon and add all the planes passing from the origin through every of its edges to the beamtree (actually you can add also only the planes passing from the edges of the original, unclipped polygon; this will make overlapping occluders, but that's not a problem). Adding planes to a beamtree is easy, when you reach a right-leaf just attach the first one to this leaf, then the other ones all as left children (that's because all the other planes will be "outside" the first one in a convex polygon; if your clipping transforms your polygon in a concave one, you have to check the other planes with the parent's normal to see if they're facing forward or backward and then attach them as right or left planes).

In our fast beamtree we won't do many things:

1. We won't clip anything, only trivial accept/reject (culling).

2. We don't need frustum planes in the beamtree (actually you can add those just to perform triangle level frustum culling and optimize the frustum clipping code, only useful with software rasterizers).

3. We don't need strict front-to-back polygon sorting (as said before we will only zsort the object stack to choose the occluders).

4. We will use only a few, pre-selected polygons as occluders.

To remove frustum planes you have to start with this beamtree structure:

        plane
         / \ 
     -plane null
      / \
  null   null

Where plane is an arbitrary plane and -plane is the opposite of that plane (multiply the plane normal by the scalar -1). You need this because if you start with an empty beamtree, the first plane added to it will split the scene in two, rejecting every polygon in one of the halves.

To choose the occluders, you could do as briefly described in the rendering stack paragraph, z-sort the entire object stack, and choose the few first (nearest to the camera) objects as occluders. If U want, you can add a check on the bounding volume size, adding only big objects, but remember to flag the discarded small objects, as they don't have to pass through the beamtree (they will be always rendered, if U pass them in the beamtree, as they are nearer to the camera than the occluders, they could be discarded. Instead of flagging them you could simply render and remove them from the stack). You can also add a flag to objects that never have to be chosen as occluders (for example objects with big bounding volumes but small geometry inside, objects with big bounding volumes but with too high geometrical complexity, very small objects and objects like a portrait on a wall that are simply redundant). Another optimization is checking first (or only if U want a more rough culling) the bounding volumes against the beamtree (remember that for bounding boxes, you can just check the polygons of the box that are facing to the camera).

Little note on sorting:

You have to z-sort the polygons of occluder objects before inserting them, because occluder insertion requires front-to-back sorting (better if you sort them all together and not object by object)!

As DirtyPunk explained to me (thanks!), you will have some degenerated cases in which this method won't work:

Warning: ascii-art ahead.

                 \
 -:polygon    ----\         
 of an object      \        
                    \   \:polygon of an object
                     \  selected as an occluder

                \ /
                 * Camera

Now object ---- is sorted behind the occluder and it's culled by it, while it should be visible! This can be fixed by sorting all polygons in the scene with a BSP tree to get real front-to-back sorting. Another method is to add bounding volumes of occluders to a linked list when choosing them and then checking every other object's bounding volume down in the render stack for intersection against the linked list. If an intersection is found, you have to draw the object and discard it from the stack. The hierarchical occlusion maps algorithm described below has the same problem, so you're warned. Ah, another little thing... You don't really have to check every object for intersection, if you're using bounding spheres for example you can use this method:

1. While creating the occluders list, keep a threshold variable that holds the minimum of the bsphere.centre.z-bsphere.radius value of all selected occluders.

2. While checking the stack you can stop if U reach a node that has bsphere.centre.z-bsphere.radius<=threshold.

Occlusion culling via hierarchical occlusion maps

This method of occlusion culling seems to be faster than beamtrees on scenes with high-depth complexity, but it's only doable with hardware rasterization, as it becomes very slow on software. It has some advantages and some penalties if compared with beamtrees:

Good/Bad

B: You have to do all in screen space while beamtrees are in camera space, so you have to project every polygon.

G: It performs better than beamtrees with 3d hardware (and if you have hardware t&l you don't care about projection overhead, either).

B: It takes much more memory, as you have to use an hierarchy of 2d maps.

G: You can do a more approximate culling, and ignore small holes in objects.

B: It only culls entire objects, not polygons.

G: You won't do occlusion culling on polygons with 3d hardware by the way...

G: You only need a 3d video card that has a CPU-readable framebuffer (uhm, not like older Voodoo cards for example).

B: Reading the video card framebuffer is slow on most cards. So this algorithm has an high setup time, and it's rather good only if you have a very complex geometry with lots of occluded objects.

Well, let's start with it... You have to select some of your objects as occluders, and this can be done exactly with the same method viewed before for beamtrees. Render and remove from the stack every object that you don't select as an occluder but that's nearer to the camera than your current occluders. Now you should have your occluders as the first n-objects on the stack. Render them to the framebuffer but don't pop them out. Set up another framebuffer and render only the occluders in pure white; no lighting, no shading, no z-buffer is needed. Now you have to generate a sort of mip-maps from this framebuffer. You need to generate some smaller maps that are the average of pixels in the bigger map, and this can be done very fast with the hardware bilinear filtered texture mapper that every card has. :) The next step is quite simple, project and check the bounding volume of every object against the coarser occlusion map. If it's all outside (all the pixels in the bounding volume are black), you can draw it; if it's all inside (all the pixels in the bounding volume are white), you can discard it, otherwise you have to check it against the next (sharper) occlusion map. If you reached the last occlusion map level and you're still half inside or half outside, you have to draw the object. The check is very fast for axis-aligned bounding boxes as they are always rectangles in screen space... If you want some approximate culling, you can choose a threshold value instead of pure black when doing your checks.

C-Buffer

C-Buffers are just an easier version of s-buffers that are a neat variant of z-buffering (I assume that everyone here knows z-buffering...). While z-buffer uses a bidimensional array to store the z value (1/z value for w-buffering that is a little better) of every pixel drawn on the screen, the s-buffer use an array of linked lists (or another dynamic structure). We have a linked list for every screen line, and when drawing a triangle, we first generate the triangle spans (storing all the useful info for the span: x start, x end, color start, color end, uv start, uv end etc.) and we add them to the s-buffer performing clipping:

 s-buffer: linked list for line x
  ================.................................................................

             ********************* new span

 s-buffer: linked list for line x after span insertion
          clip the new span
                  |
                  V
  ================****************.................................................

Then we just draw all the spans in every list. Notice that the triangles should be z-sorted front to back, otherwise it's hard to write the insertion routine, and that this method is much better than z-buffer for big polygons, but worse for small ones. A c-Buffer is like an s-buffer, but the spans are rendered while they are added and every span only has x-start and x-end values. So we can just extend the existing span instead of adding a new one. This makes the algorithm faster for small polygons and also suitable for hardware rendering (yes, you have to generate the spans in software and then check if they are all-invisible - cull away the polygon) because the polygons span generation doesn't take much time. It's also pretty resolution-independent and fast!

Frustum clipping, world vs. screen space, and n-sided polygons

Now the last part of our visibility pipeline: the clipping stage. At this stage we cut the triangles that intersect our view volume keeping only the part that is inside it. If you have been clever in the culling stage, now you know which triangles need clipping, and you will test only those. I won't explain Sutherland-Hodgman clipping (or other methods like Liang-Barsky), because I think there are many good tutorials on those and because I'm lazy. I will point out only some things:

1. Use n-sided convex polygons (especially in software rendering). It helps a lot when clipping (remember that a clipped triangle can become a convex polygon so you should triangulate it, and waste a lot of time when rendering, too).

2. 2d Clipping is a good idea. Transform all your polygons, compute whatever you need (lighting info, materials etc.), project all points and then just do all the checks in screen space. Why? Because it's faster (dot product in 2d is faster than in 3d) and because it's easier to do and more precise, too... As always you will only clip every polygon against planes that where intersected by bounding volumes earlier in the visibility stage (or you can refine this by checking only against planes that where intersected during the frustum-culling stage).

3. You still need to do 3d clipping for the Z far-Z near planes!!! And this can be a little pain with n-sided polygons...

Bibliography

Books:

- Foley, Van Dam: 3d Computer Graphics: Principles & Practice in C (2nd Edition)

- Watt: 3d Computer Graphics (3rd Edition)

Articles:

- Spatial partitioning with "Loose" octrees. By Thatcher Ulrich.

- Various articles by Edward Kmett (Harmless) for Flipcode.

- Introduction To Octrees. By Jaap Suter for Flipcode.

- Real-time Visibility Determination (Using An Octree And A C-Buffer). By Jacco Bikker (The Phantom) for Flipcode.

- Thoughts On Visibility Determination In Dynamic, Semi-Static, and Static Scenes. By Jacco Bikker (The Phantom) for Flipcode.

- 3D Clipping for Real-time Graphics. By Nils Pipenbrinck (Submissive/Cubic & $eeN).

- Backface Culling in Object Space. By Nils Pipenbrinck (Submissive/Cubic & $eeN).

- Fast Backface Culling. By Paul Adams (Frenzy).

- World Class 3d Clipping. By Paul Adams (Frenzy).

- Outcode Clipping. By Paul Adams (Frenzy).

- Computer Graphics: Principles and Practice. By Foley J., A. Van Dam, J. Hughes, and S. Feiner.

- Hierarchical Back-Face Culling. By S.Kumar, D.Manocha, B.Garret, M.Lin

- Optimized View Frustum Culling Algorithms. By Ulf Assarsson and Tomas Moller.

- Visibility Culling using Hierarchical Occlusion Maps. By H.Zhang, D.Manocha, T.Hudson, K.E.Hoff III.

For flames etc.: ken@uniserv.uniplan.it