Ray(/segment)-triangle intersection tests for dummies :P

-- warning, kinda big document ahead!

-- warning, this document is written with raytracing in mind, it's better if you know what it is as I will talk of shadow caches (buffer), camera and world space, firsthit etc...

-- thanks to pan^spinningkids,dman^ramjam,turbo^apocalypsedesign for checking it, http://mathworld.wolfram.com/ for some docs...

-- Foreword

There are various ray-tri tests around. I'll describe a few of the most interesting ones because I think that there's no "definitive" ray-tri intersection. Much depends on the type of application you're going to develop. Some methods are faster if most of the tests are positive (hit), some if they are not (early rejection). Some use lots of memory to precalculate as much as possible so they are very fast for scenes with no cache-hit problems (for examples in scenes with not so many, big triangles and spatial subdivision), while others use less memory and no precalc. etc... The precalculation option is often important so I'll talk about it more here. There are two kinds of precalculations that could be useful: per-frame precalculation (something that can be precalculated for the entire scene, for every triangle, and that just needs to be updated if the triangles are animated, if it changes their position in the next frame). The other type is per-ray-bundle precalculation (something that can be precalculated only for rays that have a common origin, for example, camera first-hit rays or shadow-rays). The second option is good only for realtime raytracing with very few triangles because there can be a lot of data to precalc and we have to avoid cache hits (a cache hit usually cost much more than computing the intersection with a slow algorithm, if we have a few big triangles and neat spatial subdivision we'll often test against the same triangle for many pixels in the first-hit and shadow rays so it's a good idea to precalculate something more. A really good idea is to store the precalculated information in a cache (for example, in the shadow-cache) and to keep precalc. info just for a few triangles or to compute precalc info only when needed (update those precalc structures only if a triangle is hit for the first time in the current frame). It's simple to do fast algorithms with lots of precalc. But don't be so happy; usually lots of precalc is a bad idea.

Every algorithm needs at least two things:

(1) the three vertices of the triangle (v1-v2-v3,
counterclockwise ordered if front-facing. This convention is useful for some algorithms).

(2) You also must have the ray. We usually describe the ray using the parametric equation R=r_o+t*r_d where r_o is the ray origin and r_d is the ray direction vector.

Almost every method computes the intersection point of the ray with the triangle plane using barycentric coordinates (u,v), that is:

i_p=(1-u-v)v1+u*v2+v*v3=v1+(v2-v1)*u+(v3-v1)*v

This point is inside the triangle if u,v>=0 and u+v<=1 (if u=0, v=0 or u+v=1 then the intersection is on one of the edges of the triangle). This is really useful as we can use u,v weights to interpolate any vertex-defined property. For example: texturing coordinates, surface color value (Gouraud shading) etc...

-- Moller-Trumbore's method (Journal of Graphic Tools, 1997)

This is the method described in the book "Real time rendering" by those two authors. It's one of the best methods, and is very popular as well. So here we go!

The entire method is based on finding a transformation that allows us to map a vector from our boring old world-space coordinate system to another basis that uses two edges of the triangle and another well-chosen vector. What does this mean? Well, if we have a n-dimensional vector in a n-dimensional vector space we can express it as a linear combination of the space's basis vectors. For every vector space there is at least one basis, but there can be many more! In fact every set of n vectors that are linear independent and span the entire space are a basis. Again, what does this mean? Nothing special really, it's just the way we're used to express vectors: we usually use a 3d space and the basis (1,0,0),(0,1,0),(0,0,1). So, a vector X can be expressed as x*(0,0,1)+y*(0,1,0)+z*(1,0,0) (that is, a linear combination), of course we store only x,y,z and say X=(x,y,z). Now, every nxn matrix express a linear transformation from a vector space into another. Think of it as a basis change. If we find a way to transform our vector X=(x,y,z) into a vector X=(u,v,t) using the basis v2-v1,v3-v1,d then X=u*(v1-v2)+v*(v3-v1)+something else, then we can recognise that those u,v coordinates are barycentric coordinates. Enough said (I hope, because I don't think I'll explain the other algorithms so "in deep", if you don't know linear algebra, read a good book about that as it's the bare minimum to do 3d stuff or you can download my matrix tutorial :P), now let's go on with the algorithm:

1) Translate everything (v1,v2,v3 and r_o) so that v1=(0,0,0) (space origin)

2) Build the transformation matrix and apply it to r_o. The transformation matrix is really easy to "find" as it has to satisfy this equation: M*T_M=I where the basis vectors v1-v2,v1-v3 and -r_d (we'll see later why this is a good choice for our third basis) form the colums of M. This means that the tranformation has to map our basis to (1,0,0),(0,1,0),(0,0,1) (those are the columns of the I, identity matrix) and this is the geometrical interpretation of the multiplication of a matrix by its inverse. So we know that T_M is inv(M), applying it to r_o yields the u,v coordinates of the ray-plane intersection and t, the parameter for the intersection point in the ray equation (now it should be clear why we have choosen -r_d as the third basis vector).

Now let's notice a few things before presenting a more optimized version of the same stuff. First of all, we know that a square matrix can be inverted only if it's non-singular, this means if its column-vectors are linear independent so they form a basis. But what if the matrix is singular (this means, det(T_M)=0) or nearly singular? As in 3d space three vectors are linear independent only if they are not on the same plane, this is as to say that our basis vectors lie on the same plane (or nearly, we have to take care of numerical errors) and so the ray direction is parallel to the triangle plane, thus no intersection can occur.

At the moment this algorithm may seem too complex (we're talking about computing a determinant, inverting a matrix and performing a matrix-vector multiplication) but we'll simplify it adding many optimization and early-rejection tests, so it'll be fast and sexy. Also notice that this algorithm is "minimum storage" as it requires only the bare minimum information about the triangle, you don't have to precompute the triangle normal or anything else for it. Let's go on, now we have only to use Cramer's rule for matrix inversion and perform some steps to simplify the formulas, there's nothing really to be explained here so I'll cut&paste the formulas from Moller and Haines DrDobb's Journal article:

Let's denote e1=v2-v1, e2=v3-v1 (the two edges), s=r_o-v1 (the translated ray origin) and knowing that det(a,b,c)=dot(cross(a,b),c)=-dot(cross(a,c),b)-dot(cross(c,b),a) we have (yeah, ascii-art ahead!):

/ t \ 1 / det(s,e1,e2) \ | u | = ------------- * | det(-d,s,e2) | \ v / det(-d,e1,e2) \ det(-d,e1,s) /

That's to say:

/ t \ 1 / (s X e1)*e2 \ | u | = ------------- * | (d X e2)*s | (where X is the vector cross-product) \ v / (d X e2)*e1 \ (s X e1)*d /

The final algorithm (u,v are the barycentric coords, t is the ray parameter for the intersection point):

MollerTrumbore_IntersectRayTri(vec3d r_o,vec3d r_d,vec3d v1,vec3d v2,vec3d v3) returns (float u,float v,float t,bool intersect,bool frontfacing) { vec3d e2=v3-v1; // second edge vec3d e1=v2-v1; // first edge vec3d r=cross(d,e2); // (d X e2) is used two times in the formula // so we store it in an appropriate vector vec3d s=r_o-v1; // translated ray origin vec3d a=dot(e1,r); // a=(d X e2)*e1 real f=1/a; // slow division* vec3d q=cross(s,e1); real u=dot(s,r); bool frontfacing=true; if (a>eps) // eps is the machine fpu epsilon (precision), // or a very small number :) { // Front facing triangle... if (u<0)||(u>a) return 0,0,0,false,frontfacing; vec3d v=dot(d,q); if (v<0)||(u+v>a) return 0,0,0,false,frontfacing; } else if (a<-eps) { // Back facing triangle... frontfacing=false; if (u>0)||(u<a) return 0,0,0,false,frontfacing; vec3d v=dot(d,q); if (v>0)||(u+v<a) return 0,0,0,false,frontfacing; } else return 0,0,0,false,false; // Ray parallel to triangle plane real t=f*dot(e2,q); u=u*f; v=v*f; return u,v,t,true,frontfacing; }

*) We hope that the host fpu can operate in parallel (multiple pipelines) and thus this will complete before the end of the code, when it's needed. If such parallelism is not possible it's better to place this division after every early rejection test. This optimization becomes less meaningful if we are sure that the ray will hit the triangle most of the times (for example if we use a very fine grained spatial subdivision method or multiple, nested spatial subdivisions).

Advantages:

1) Minimum storage

2) Not so many early rejection tests (before reaching the first one you have done two cross products and two dot products)

3) It can do backface culling if necessary

4) It could be even implemented in mixed floatingpoint/fixedpoint operations to exploit the fpu/alu parallelism of some cpu (for example, the intel pentium series) i.e. compute the division in the fpu and do fixedpoint alu operations meanwhile.

Disadvantages:

1) There is not much to precalculate

-- Dan Sunday's method

This method by Dan Sunday seems to be very similar to the Moller's one but finds another way to compute the barycentric coordinates of the intersection point. It uses a 3d generalization of Hill's perp-dot product: for a plane P with normal P_n and any vector a in the plane (dot(n,a)=0) perp(a)=cross(n,a), that is another vector in the plane P (dot(n,perp(a))=0) perpendicular to a (dot(a,perp(a))=0). This operator is linear on vectors that lie in P, which means that for every x,y scalars and a,b vectors in P perp(x*a+y*b)=x*perp(a)+y*perp(b). If P is the xy plane with n=(0,0,1) then this is the Hill's 2d perp-dot product. Now, we'll use this perp-dot product to solve the ray-plane intersection point using the plane parametric equation (P(u,v)=v1+u*e1+v*e2 where u,v are scalars and e1,e2 vectors: e1=v2-v1 e2=v3-v1. You should recognise that this is equal to computing the barycentric coordinates), if i is the intersection point we can write:

i-v1=u*e1+v*e2

If we take the dot product of both sides with perp(e2) and substitute w=i-v1 we get:

w*perp(e2)=u*e1*perp(e2)+v*e2*perp(e2)=u*e1*perp(e2)

From this we can get u of the intersection point. If we take the dot product of both sides with perp(e1) we can solve for v. So we have (if n is the normal vector of the triangle):

u=(w*perp(e1))/(e2*perp(e1))=(w*(n X e2))/(e2*(n X e2))

v=(w*perp(e2))/(e1*perp(e2))=(w*(n X e1))/(e2*(n X e1))

Notice that if the triangle is degenerate (has zero area) the denominators ((e2*(n X e2)) and (e2*(n X e1))) are nonzero, so this method is applicable (well of course there could be potential problems of numerical stability in those formulas). We can optimize this algorithm a little more if we remember that (a X b) X c=(a*c)b-(b*c)a, this leads to the following:

perp(e1)=n X e1=(e1 X e2)Xe1=(u*u)v-(u*v)u, doing the same for perp(e2) we get:

u=((e1*e2)(w*e2)-(e2*e2)(w*e1))/((e1*e2)^2-(e1*e1)(e2*e2))

v=((e1*e2)(w*e1)-(e1*e1)(w*e2))/((e1*e2)^2-(e1*e1)(e2*e2))

This means that this algorithm requires five dot products and no cross products. It's still slower than the Moller's algorithm but notice how, using those formulas, we can precompute a lot of stuff (only a few stuff is dependant from w, that is the ray-plane instersection). We could store the following information in the triangle structure (or in a precalc. cache circular list):

e1=v2-v1

e2=v3-v1

prec1=e1*e2

prec2=e1*e1

prec3=e2*e2

prec4=1/prec1^2-prec2*prec3

We have:
u=(prec1(w*e1)-prec2(w*e2))*prec4;

v=(prec1(w*e1)-prec3(w*e2))*prec4;

So the algorithm takes only two dot products, two scalar mults and one subtraction. Notice that to compute w we have to pre-compute the ray-plane intersection parameter t:

t=(n*(v1-r_o))/(n*r_d)

Again, if n*r_d=0 then the ray is parallel to the plane, no intersection will occour.

We could arrange the terms in a different way and precalulate something less if needed, if we do all the precalc depicted here the triangle structure has to store 16 floats instead of 9 of muller-trumbore, so it's wise to precalc less or to use a precalc cache instead of precalculating for every triangle.

Advantages:

1) Many thing can be precalculated (very fast if cache misses are not a problem), tunable precalculation

Disadvantages:

1) Not so many early rejection tests (more or less like M-T)

2) Slower than M-T in its base version

3) Memory-inefficient in its precalculated version (this means that it will be slow on complex meshes)

--- Badouel's algorithm (Graphic Gems, 1990)

This is as far as I know the first algorithm to use barycentric coordinates and is as famous as the M-T one (but slower :P). It works by decomposing the barycentric equation for points in a triangle (p-v1=u*(v2-v1)+v*(v3-v1)) into a system of three equations (p is the intersection point, computed as for the previous method by using the ray-plane test):

px-v1x=u*(v2x-v1x)+v*(v3x-v1x)

py-v1y=u*(v2y-v1y)+v*(v3y-v1y)

pz-v1z=u*(v2z-v1z)+v*(v3z-v1z)

Then projecting the triangle on one of the planes xy,xz or yz by stripping one of the coordinates (we'll explain later how) so we get:

pc1-v1c1=u*(v2c1-v1c1)+v*(v3c1-v1c1)

pc2-v1c2=u*(v2c2-v1c2)+v*(v3c2-v1c2)

And we can solve for u,v (c1 and c2 are the remaining two coordinates). The coordinate to strip is the one that has the biggest value in the triangle normal (the normal of the plane on which the triangle lies). This algorithm is straightforward to implement but it's usually slower than for very complex meshes M-T because we have to store the triangle normal too in our triangle intersection structure (more data per triangle means more cachehits, and those are frequent in meshes with very small triangles) or do an extra cross product to get it. Also, we have to do more branches (and branches are expensive on modern pipelined processors! By the way, on some architectures we can replace them by conditional stores or we can play around with the sign bits of floating point numbers) and we have two extra divisions and not so many early rejection tests.

Badouel_IntersectRayTri(vec3d r_o,vec3d r_d,vec3d v1,vec3d v2,vec3d v3) // No precalculated normal version returns (vec3d ip,u,float v,float t,bool intersect) // We return ip too because the algorithm computes it internally, // so it's less expensive than recomputing it via the t parameter { // Compute the triangle normal vector vec3d norm=cross(v1,v2); float tmp=dot(norm,r_d); if (abs(tmp)<eps) return (0,0,0),0,0,0,false; // ray is parallel to tri.plane float t=dot(norm,v1-r_o)/tmp; // Compute the two coordinates that we'll use int i1,i2,i3; if (norm[1]>norm[2]) if (norm[1]>norm[3]) { i1=2; i2=3; } else { i1=1; i2=2; } else if (norm[2]<norm[3]) { i1=1; i2=2; } else { i1=1; i2=3; } u3=6-i1-i2; // Compute the intersection point (only the two needed coordinates) vec3d ip; ip[i1]=r_o[i1]+r_d[i1]*t; ip[i2]=r_o[i2]+r_d[i2]*t; // Tranlsate everything float u0 =ip1-v1[i1]; float v0 =ip2-v1[i2]; float u1 =v2[i1]-v1[i1]; float v1 =v2[i1]-v1[i2]; float u2 =v3[i1]-v1[i1]; float v2 =v3[i1]-v1[i2]; // Do the barycentric stuff... if (abs(u1)<eps) // uncommon { v=u0/u2; if(v<0||v>1) return (0,0,0),0,0,0,false; u=(v0-v*v2)/v1; } else { v=(v0*u1-u0*v1)/(v2*u1-u2*v1); if(v<0||v>1) return (0,0,0),0,0,0,false; u=(u0-v*u2)/u1; } if(u<0||(u+v)>1) return (0,0,0),0,0,0,false; // compute the missing coordinate of the intersection point ip[i3]=r_o[i3]+r_d[i3]*t; return ip,u,v,t,true; }

You should recognize that this method is only a 2d version of the M-T one (well in fact M-T is a 3d version of this one, as it has been developed later). We translate the projected triangle, and then the solution of the system can be seen as a matrix-vector multiplication (as every system can be) i.e. a linear transform that does the same mapping of M-T but in 2d.

Advantages:

1) It's good if you precalc. the normal and/or the projection plane

2) Early rejection

Disadvantages:

1) It's slower than M-T on complex meshes and it's not better on simple ones

2) It doesn't have many early rejection tests

--- Segura-Feito's method and a slightly modified version of S-F method

This method uses a segment, not a ray. The segment will be identified by two points r_s,r_e (ray start-end). The vertices of the triangle should be ordered counterclockwise (as for M-T). In addition, this method will only test whether or not the segment intersects the triangle , but not the intersection point itself. That has to be computed via the standard ray-plane intersection. The ray-plane intersection can be performed before the segment-triangle intersection test or after it. Again this is similar to the problem of early division in M-T. If we perform it before, we have an easy way to compute r_s and r_e for a ray: if t is the ray parameter for the intersection point then we can say tmp=t*r_d r_s=r_o+tmp-r_d and r_e=r_o+tmp+r_d. Otherwise, we can do something like that: r_s=r_o and r_e=r_o+far_value*r_d where far_value is the far clipping distance. The segment-triangle test uses the following theorem (Segura 1998):

Let ABC be a triangle in R^3 (three dimensional space) and q,q' be two point that determine a segment in R^3, placed on different sides of the plane determined by ABC and orders so that the tetrahedron QABC has positive orientation (the signed volume of QABC is positive, i.e. it's front-facing). Then segment qq' cuts the triangle ABC if and only if: sign([q'abq])>=0 and sign([q'cbq])>=0 and sign([q'acq])>=0. That's because qq' will cut ABC if and only if q' is contained in the trihedral of edges qa,qb,qc that's to say that q' sees triangles ABq BCq CAq clockwise and so the tetrahedral q'AqB,q'CBq and q'ACq have positive orientation (the signed volume is positive). If one of the signs is zero q' is coplanar with one of the triangles ABq, BCq, CAq and so the intersection point is on the edge AB, BC or CA respectively. If two signs are zero q' is collinear with one of the edges qA,qB or qC, the intersection will be on one of the triangle vertices. So, the pseudocode for that algorithm so far is:

SeguraFeito_TestSegmTri(vec3d r_s,vec3d r_e,vec3d v1,vec3d v2,vec3d v3) returns (bool intersect) { int i=area_sign(r_s,r_e,v1,v3); if i<=0 return false; i=area_sign(r_s,r_e,v2,v3); if i<=0 return false; i=area_sign(r_s,r_e,v1,v2); if i<=0 return false; return true; } SeguraFeito_IntersectRayTri(vec3d r_o,vec3d r_d,vec3d v1,vec3d v2,vec3d v3) returns (vec3d ip,float t,bool intersect) { vec3d n=cross(v1,v2); // Compute the normal vector float t=(dot(n,(v1-r_o)))/dot(n,r_d); // get the parameter t for the // ray plane intersection vec3d tmp=r_o+t*r_d; vec3d r_s=tmp-r_d; vec3d r_e=tmp+r_d; if SeguraFeito_TestSegmTri(r_s,r_e,v1,v2,v3) return(tmp,t,true); else return(tmp,t,false); }

How to compute area_sign? Well we should know that if A is an nxn matrix with real numbers, then det(A) has the interpretation as the oriented n-dimensional content of the parallelepiped spanned by the column vectors a(i,1),...,a(i,n) in R^n. Here, "oriented" means that, up to a change of sign, the number is the n-dimensional content, but the sign depends on the "orientation" of the column vectors involved. If they agree with the standard orientation, the determinant is positive if not it's negative. The parallelepiped spanned by the n-D vectors v(1) through v(i) is the collection of points t(1)*v(1)+...+t(i)*v(i) where t(j) is a real number in the closed interval [0,1].

For the function area_sign the original Segura-Feito paper used a method proposed by Yamaguchi (1990) to compute a 4x4 determinant. I have a part of a book by Yamaguchi and I can see that work was aimed for high-precision geometric problems and used a variable precision rapresentation of the matrix (as far as I know). So here I'll develop another way to compute it (now anything you find below are my thoughts so I can't promise you that I did not made any error there :P )...

If we translate our world so rs=(0,0,0,1) (camera-space) then we can reduce our 4x4 determinant to a 3x3 one:

area_sign(rs,re,v1,v3)=sign(v3z*(rex*v1y-v1x*rey)+v3y*(v1z*rex-rez*v1x)+ v3x*(v1y*rez-rey*v1z))=sign(v3*(re X v1)); area_sign(rs,re,v1,v2)=sign(v2z*(rex*v1y-v1x*rey)+v2y*(v1z*rex-rez*v1x)+ v2x*(v1y*rez-rey*v1z))=sign(v2*(re X v1)); area_sign(rs,re,v2,v3)=sign(v3z*(rex*v2y-v2x*rey)+v3y*(v2z*rex-rez*v2x)+ v3x*(v2y*rez-rey*v2z))=sign(v3*(re X v2));

You'll notice that this requires the same amount of calculations done by M-T (two cross products and a few dot products) but we don't need a division (well we need it if we want to compute the intersection point, not if we want to check for intersection). The sign function can be implemented with a conditional branch or by stripping the sign bit of the floating point numbers. We can simplify the calculations further if rewrite our formulas as:

area_sign(rs, re,v1,v3)=-area_sign(rs,v3,v1,re)=-sign(re*(v3 X v1))=sign(re*(v1 X v3));

area_sign(rs, re,v1,v2)=-area_sign(rs,v2,v1,re)=-sign(re*(v2 X v1))=sign(re*(v1 X v2));

area_sign(rs, re,v2,v3)=-area_sign(rs,v3,v2,re)=-sign(re*(v3 X v2))=sign(re*(v2 X v3));

(Switching two rows or two columns in the matrix changes the sign of the determinant.) Now we have three cross products instead of one but we can precompute them, with a major speed gain. Now this has an easy geometrical interpretation (in fact I figured out this method by myself before knowing of Segura and Feito's works, and only now I recognize that the S-F method can be reconduced to the following one), if we express a plane with the following equation:

N*p+D=0

Where N is the plane normal, D is distance from origin and p is any point on it. Then we can think that v1 X v3, v1 X v2 and v2 X v3 are the normals of three planes passing from the origin (D=0) and intersecting every triangle's edge. So by doing re*(v1 X v3) we're just checking on which side of the plane the point re is. If re is inside the volume defined by the three clipping planes then the ray will hit the triangle (the same applies to polygons too by the way). In this interpretation you can see that re can be any point on the ray so we can finally write our SeguraFeito test as follows:

SeguraFeito_TestSegmTri(vec3d r_o,vec3d r_d,vec3d v1,vec3d v2,vec3d v3) returns (bool intersect) { v1-=r_o; v2-=r_o; v3-=r_o; vec3d tmp=cross(r_d,v1); // we use r_d as our re int i=sign(dot(v3,tmp)); if i<=0 return false; i=sign(dot(v2,tmp); if i<=0 return false; i=sign(dot(v3,cross(re,v2)); if i<=0 return false; return true; } // TestSegmTriPrecalc (idea...) SeguraFeito_TestSegmTriPrecalc(vec3d r_o,vec3d r_d,triangle tri,frame currframe) returns (bool intersect) { if(tri.updateframe!=currframe) { tri.v1xv3=cross(tri.v1-r_o,tri.v3-r_o); tri.v1xv2=cross(tri.v1-r_o,tri.v2-r_o); tri.v2xv3=cross(tri.v2-r_o,tri.v3-r_o); tri.updateframe=currframe; } int i=sign(dot(r_d,tri.v1xv3)); if i<=0 return false; i=sign(dot(r_d,tri.v1xv2); if i<=0 return false; i=sign(dot(r_d,tri.v2xv3); if i<=0 return false; return true; }

Final note on this method. It's very good for triangle strips as you can cache the per-edge computation and reuse it for the next triangle (you have only to change the sign, as is the edge-plane was pointing inside the original triangle it will point outside the triangle that shares that edge). Knowing this and our latest result you should recognize that the whole thing could be rearranged and you get... a beam tree for shadow check. So is using a beamtree a good idea for shadow tests? I don't think so, shadow hit is an "any query" (any hit will return that the surface point is shadowed) but we should not care much in determining if there is any hit fast (this is done efficiently enough with a small shadow cache imho) but if there isn't any as this is a much more complex problem, and we will expect to "light" our objects most of the time (it's difficult to see a completelly shadowed scene...). So we have to use some form of spatial partitioning and... this is not the topic of this tutorial by the way... :P

- Standard version

Advantages:

1) It should be more or less equivalent to M-T

2) Early rejection

3) It should be more stable than M-T

4) It should be faster than M-T if we need only intersection test, not the intersection point (so for example is good for shadow-rays)

Disadvantages:

1) It does not compute the barycentric coordinates so it's unsuitable for some applications

- Precalculated version

Advantages:

1) It's much faster than M-T

2) Early rejection

3) It should be more stable than M-T

4) It's really fine for shadow-rays as we can store the precalc stuff in the shadow-cache (everyone should use a shadow cache... :P)

Disadvantages:

1) It does not compute the barycentric coordinates so it's unsuitable for some applications

--- Efficient and Reliable Intersection Tests: ray-triangle intersection (Journal of Graphic Tools, 1998)

Now we will approach another 3d to 2d projection. In fact, this algorithm is based (more or less) on the same idea of the S-F one but in 2d. In fact it has been developed prior to the S-F algorithm but as for Badouel and M-T I prefer explaining the 3d case first as the 2d projection is easy when you have understood the general one. Now, this is really like Badouel's, the differences start after you have projected the triangle and the intersection point. Instead of computing the barycentric stuff we check if the 2d point is inside the triangle by checking the sign of the determinants det(v1,v2,ip), det(v2,v3,ip), det(v3,v1,ip). Notice that the orientation of v1,v2,v3 can be deduced >from the normal vector n=cross(v2-v1,v3-v1), if when dropping the i coordinate n(i)>0 then v1,v2,v3 are CCW ordered in the projection plane. The author of this text (Martin Held) claims that this algorithm is superior to the barycentric coordinate one (Badouel's one as I can understand from the text) and to the infinite triangular pyramid method (the modified S-F) but I don't know what tests he has made so... By the way, I don't think it will be better than M-T or S-F methods. Also, it has all the disadvantages of S-F plus it requires computing the intersection point before doing any other test (like Badouel's or Sunday's ones) and this can be expensive in scenes with low ray-hit ratio.

--- Plücker (Grassmann ) coords ray-triangle intersection

Plücker coordinates are a way to represent directed lines in a 3d space. They form a 6d homogeneous space (a homogeneous coordinate of a point (x(1),x(2),x(3),x(4)...x(n)) is (y(1),y(2),...,y(n+1)) such that y(1)/y(n+1)=x(1),y(2)/y(n+1)=x(2) etc... Now if we have our ray R(r_o,r_d) then we can "encode" it into a plucker coordinate (r_d : r_d X r_o) where the : means the "concatenation" of the two vectors (i.e. (A:B)=(a(1),a(2),a(3),b(1),b(2),b(3) if A,B are 3d vectors). Now, we can do many nice things in plucker space, but the only one that really matters to us is that if we have two rays R1=(U1:V1) and R2=(U2:V2) and w=U1*V1+U2*V2 (permutated inner product) then if w=0 they intersect, if w>0 R1 "sees" R2 counterclockwise, if w<0 R1 "sees" R2 passing clockwise. I borrow this ASCII-art from a Flipcode tutorial to explain me better:

----------> o o ----------> counterclockwise clockwise

Now, how to do triangle intersection with that is left to you, it should be very straightforward and it should lead to something that we already explained...

-- Conclusions

Those are only a few algorithms. The most useful one is M-T... but I think that now you should have some more ideas on how to do ray-triangle intersection and you can develop new methods, or customize those ones to better fit your application. For example, you could think of precalculating normal edge vectors v2-v1/|v2-v1| and v3-v1/|v3-v1| and then using the dot product with the translated intersection vector... Well, bye :)