Logical vs. Bitwise

I started a new job not too long ago. All new hires at my company need to go through a couple months of training. That’s fine; I actually see it as a good thing. I’ve been learning how the company works, who is in charge of what and what I need to do when I screw up.

A large part of the training actually involves teaching newcomers how to code. This is actually the main focus of the training as they hire people who have majored in math or economics and don’t have a background in code. Having quite a large background in code, the training has been rather dull. But it’s still good for me to know how the company works, and it has made me comfortable enough that I won’t have the pressure of a new job bearing down on me once I actually start doing important work.

But my background also gives me a different perspective on coding and the specifics of languages that I’ve used. When they are told things about a language, they just have to accept it and move on without really questioning the reasons. For instance: they were taught that the difference between & and && in Java has to do with what exactly is executed when used in an if statement.

So, the following two if statements are executed differently. The first will not execute obj.hasElements() when obj is null, while the second statement will throw a NullPointerException.

if(obj != null && obj.hasElements()) {
    obj.run();
}

if(obj != null & obj.hasElements()) {
    obj.run();
}

Of course, this is correct; however, I have a problem with explaining the difference between the two AND statements for a couple reasons. Firstly, BitwiseAND and BitwiseOR are my namesake websites, and really are close to my heart. When described so poorly, I feel a need to speak out on their behalf. Secondly, the second if statement really wouldn’t be used this way; at least not in Java (I could see this occurring in C++ quite a bit). Bitwise operators (&, |, ^, …) are for manipulating bits, not booleans (as the name implies). Likewise, logical operators (&&, |, !, …) are for logic statements. There is no reason to write the second if statement for efficiency and semantic reasons.

Now, to be fair, the instructors must have mentioned this to let the students know why a program was crashing with the second version and why they should stick to the first version. I asked one of the students if they gave any scenario in which the bitwise operations would be used, and I was told that they didn’t give any examples.

So, the students are seeing the bitwise operations and are left with a sense that they should never use them in an if statement and therefore are pointless additions to the Java language. Never to be used and only to confuse you when you use & instead of &&.

Coding methodology

You know what I find fun? Coding.

Not planning to code, not testing by hand, but pure coding from scratch and then untangling the mess I made by debugging it. That’s why I don’t care for certain methodologies. The Waterfall method specifically.

Don’t get me wrong; when done correctly coding models work beautifully. In fact, it was the Waterfall model that I just used recently on a small program that worked so well to show me how much I despise.

First, we start with the requirements. Make a list of the major requirements in the project, now break those down into manageable pieces, break those down into finer details, keep breaking down each item in the list until you have one specific requirement for each bullet point. Now your list is 30 pages long and 15 levels deep.

Now that we have a list of requirements, “design” your program. Don’t code it; “design” it in psuedocode.

Next, create your test data. Make sure you have a test for each and every requirement. Also make sure that each branch in you design is tested somehow. Be thorough; test edge conditions, test invalid input. Verify that every possible input is going to be tested. Now test your design by hand. I know this sounds like something a computer can do better than a human, and in fact, it is. However, you would have to code your design in order to test it and that part comes later.

After killing yourself by pouring over each line of code with every item of test data you are now ready to code your program. Lucky for you, you’ve managed to create a perfect design after all that testing by hand, so this is actually the easy part. No thinking involved other than dealing with the minutia of the language itself (but you probably hinted at that in the design anyway).

Now do your actual testing in order catch your coding mistakes. Run all the test data through and compare it with the output from earlier. It all works great right? Great!

You’ve just managed to do so much work in advanced that the actual coding and debugging has become the shortest part of the process. The rest of the time you spent doing boring tasks that put you to sleep.

Now, not all methodologies are like this. Variants of the Agile method are more oriented towards the needs of the programmer instead of the needs of a corporation. These other methods make the time line of a project seem uncertain and risky, but in my experience, the final project has a higher quality and you’ve made your programmers happier at the same time by working with them instead of demanding deliverables.

So, long story short, the Waterfall method is a great way to make a working program, as long you think coding and debugging is the most expensive or least important part of the process. I, on the other hand, love to code. Requirements and designs are necessary, sure, but not in such detail. Lets get to the coding and test from there. Don’t make me do so much work that I will only do poorly because I hate it so; just tell me what you want, and I’ll code it for you.

Covert Channels over BitTorrent

BitTorrent is already used for many nefarious activities. Mostly software, music and movie piracy. Here’s another layer of suspicious actions that we can add to the list. At the end of last year me and four other guys got together and created a way to send covert messages over the BitTorrent protocol. With this, even perfectly legit BitTorrent communications could be consider iffy due to the messages hidden within.

Covert Channels

What’s a covert channel anyway? In the simplest of terms it could be considered “hiding in plain sight”. A more complicated definition is given by Wikipedia:

In information theory, a covert channel is a parasitic communications channel that draws bandwidth from another channel in order to transmit information without the authorization or knowledge of the latter channel’s designer, owner, or operator.

OK, maybe that’s a little too complex. Here’s one of the most famous examples of covert channels that I know of. Images are made of two dimensional arrays of three color channels - red, green and blue. Each color channel uses eight bits generally for 256 different shades of red, green and blue. Let’s pretend we have such an image at 512 by 512 pixels. Now, as it turns out, it is difficult for the human eye to see a change of one in a color channel. That is, 127 red is almost identical to 128 red to the human eye. Same goes for each color channel (more so for the blue channel, but let’s just stick with the one bit for now).

Because we know that people will have a hard time seeing this change, we could easily change the last bit of each color channel in each pixel to 1 or 0 without anyone being the wiser. This gives us 512×512x3 = 786,432 bits to play with. I could easily put an eight bit gray scale image at a resolution of 300×300 into the original image without anyone knowing. Plenty of room to hide a secret image within another image. The original image could be used on a company web site and whoever I’m sending the image to can download it and extract the hidden image to find some wrongdoing going on at the company.

The main idea is that covert channels take a perfectly legit and reasonable method of storing or transmitting data, find some weakness in the protocol (or in the above case, human limitations), then exploit the weakness to transmit some form of data along with the original data that can be deciphered.

BitTorrent Protocol

BitTorrent is an interesting file sharing protocol. Normally when someone wants to share a file over the internet they upload it to a web page or FTP server. Let’s say this file is big, an operating system ISO like Linux running anywhere from hundreds of megabytes to even gigabytes in size. Downloading the file from a web site might take a long, long time. Now, if this file is extremely popular you’ll get thousands of people trying to download a big file all a once causing the server to crash in the process.

Here, the problem is that the file is so popular that the one server hosting the file can not handle the traffic from so many different people. BitTorrent solves this problem by distributing the load. First, the big file is broken into smaller pieces, often hundreds of kilobytes in size. Then, whenever someone wants to download the file, they connect to everyone who already have some or all of the file and starts downloading random pieces of the file.

By randomizing the pieces, the file is distributed to many machines instead of being hosted by a single machine. The more popular the file, the people who are hosting parts or all of the file making it even easier to download. It’s actually very clever, taking a problem like server crashes and turning it into an increase of availability of the file.

Each piece of the file is indexed from 0 to n-1 for a file divided into n pieces. When a client wants a specific piece it will send a piece request to another client with the piece index in order to download it. The client then sends the piece to the requester and the operation continues until the downloader has all pieces of the file. Because of the random nature of piece requests, it is entirely possible for two clients to connect and download pieces from each other, giving them both more of the file than they had before the connection.

Covert Channels via Piece Requests

Given the fact that piece requests are random, how can we exploit this to send a covert message while still keeping the requests random looking? Our solution manifests itself in a translation table. As it turned out, the solution was also quite flexible in it’s ability to send all different kinds of messages.

First, we must choose an alphabet. The alphabet consts of all characters that we plan to send in our messages. This can be as large or as small as you want, it doesn’t matter. For instance, plaintext messages can use an alphabet of 29 characters, a-z space and some punctuation. For encrypted messages, you can use a set of two characters for binary, or a set of 256 characters for each 8 bit chunk. We found that encrypted messages using an alphabet of 16 characters, one for each 4 bits was a good fit.

Now that we have an alphabet, we can create a translation table from the available pieces based on the connected clients. Available pieces are pieces that the downloading client does not have and the uploading client does have. That is to say, any piece that makes sense for a piece request.

The translation table is a two-dimensional lookup table where each column represents a letter in the alphabet. The rows are filled with each available piece increasing in order. For an alphabet of s characters, the first row consists of the first s available pieces, the second row of the next s available pieces and so on until no pieces remain.

We send the messages on a character-by-character basis. The set of available pieces is created, the translation table is created and the next character to send is defined. We then index into the translation table using a random row and the corresponding column to find a piece to request. The piece is requested and then removed from the set of available pieces. The translation table is rebuilt and the next character defined and the process starts over again. This continues until we run out of characters or available pieces.

What’s nice about this approach is that it preserves the random nature of the BitTorrent protocol by only adding a small amount of structure. One can send a message consisting of the same character repeated many times without the structure showing up in the piece requests. Add encryption into the mix and even if the message is discovered, another layer of protection is added - maybe even deniability. Another plus is that the covert message does not interfere with the BitTorrent protocol in any way. After the message has been sent, the rest of the file can be downloaded normally and the file will be complete. The message is only hidden within the piece ordering, not the piece itself.

The entire project was implemented using the open source BitTorrent client in Python. We managed to even add a GUI front end to both type and read messages being sent. We achieved two-way communication by using two different downloads. The biggest problem is the time it takes to communicate a message across. As with most covert channels, we need to take a hit somewhere in order to keep the communication secret. In the example I gave earlier, the hit was in image resolution and color. Here it’s all in the time it takes to send the message.

Imagine a 1000 character message with piece sizes of 128 kilobytes each and a download rate of 100 kilobytes per second. That is a very small message with small piece sizes and a fast connection and it will still take over 20 minutes to send the entire message.

Depth Map, Normal Map

A lot of interesting graphics techniques require the use of depth maps and normals maps. Depth maps store information of how far a pixel is from the eye in the final rendered scene. Similarly, normal maps store the surface’s normal vector for the cooresponding pixel. One way to do this would be to draw the scene with vertex normals instead of vertex colors and copy the color and depth buffers into a texture. A more straight forward technique uses vertex and pixel shaders and gives us more control over the final maps. Also, if we use GL_EXT_framebuffer_object, we can avoid any nasty context switches that are associated with pBuffers.

The first thing we do is understand the layout of the maps. Really, we’ll only be rendering to a single texture. This has the advantage of saving space, but may cause a problem if we only have 8 bits per channel. If the normal {x, y, z} is stored in {r, g, b} and the depth stored in {a}, then we’ve gone from 32 bit floating point {x} to 8 bit {r}, or 24 bit depth to 8 bit {a}; the same goes for the other components. This loss in precision will show up in banding artifacts and the errors will be carried through for every computation that follows that map use. But sometimes this is unavoidable based on the platform that we’re using.

How do we fit a normal and depth into a single pixel? Well, we know which color component will hold which value, but we then find out that the GL will clamp our texture values to [0,1]. No negative values for us. Luckily, the depth is already clamped to [0,1] by virtue of the graphics pipeline; the normal is an entirely different problem. It does have an easy solution though. Normals can be normalized (duh), which means each component will map to [-1,1]. If we normalize, then multiply by two and subtract one, our mapping will be fine. We just need to undo the tranformation when we reference it later on.

This gives us our GLSL vertex shader:

varying vec3 normal;
varying float depth;
void main()
{
    gl_Position = ftransform();
    normal = gl_NormalMatrix * gl_Normal;
   
    vec4 eyeTmp = gl_Position;
    eyeTmp.xyz = eyeTmp.xyz / eyeTmp.w;
    depth = eyeTmp.z;
}

And our fragment shader:

varying vec3 normal;
varying float depth;

void main()
{
    vec3 N = normalize(normal);
    N = 0.5 * N + 0.5;
   
    gl_FragColor = vec4(N, depth);
}

Now, that may not be the absolute best way, but it works on nearly every platform that can use GLSL. One issue that I should bring up, is that the pixel’s depth value can be directly referenced in the fragment shader through gl_FragCoord.z; on some machines, reding this value was very, very slow, but now I’ve seen great improvement in reading the value. This means we can remove all references to the varying variable depth from both shaders.

Depending on your system, you may have acces to floating point textures through GL_ARB_texture_float. Two very nice things about that extension. First, floating point textures are not clamped by the GL. This means that messing around with the normals can be avoided. Second, we are no longer limited to 8 bits per channel. We now have access to 16 or 32 bits per channel which will reduce or remove banding issues completly.

Now what do our shaders look like? The vertex shader:

varying vec3 normal

void main(void) {
    gl_Position = ftransform();
    normal = gl_NormalMatrix * gl_Normal;
}

Fragment shader:

varying vec3 normal;

void main (void) {
    gl_FragColor = vec4(normalize(normal), gl_FragCoord.z);
}

OpenGL Transforms and the Inverse Model View

Below, I mentioned that I stored the inverse modelview matrix in gl_TextureMatrix[0] . Why did I do this? Well, I needed to transform the vertices and vectors from eye space to world space. Unfortunately, OpenGL sets up the Model View matrix to transform from object space to camera space; the inverse of that would skip right over world space and go straight back to object space. Solution? Once the camera transform is set up, invert it and store it in the texture matrix. This way, we can transform something to camera space (which we usually do anyway), then transform it to world space by removing the camera transform. But first, a review of all the transforms OpenGL does.

Object Space to Image Space (Quick and Dirty)

First, how do we move from object space to image space in OpenGL? Well, we start off with a vertex V in object space, simple enough. Then, some transforms are applied to the object. Let’s say it gets rotated, translated, scaled, rotated again and translated once more. Each of these transforms can be described by their own matrix, but for simplicity’s sake, we’ll say that they were all multiplied into a single matrix - M. So, we have V, a vertex in object space, and now V * M, a vertex in world space.

Next, there’s the transformation from world space to camera space. In OpenGL, camera space consists of right = +X, up = +Y, and forward = -Z. This can be done using the command gluLookAt with the camera’s location, point of focus and up vector. We’ll call this matrix C. Moving a vertex from object space to camera space is then V * M * C, seeing the pattern?

Once we are in camera space, we use the projection matrix (specified with functions like glFrustum and gluOrtho2D) to again change the coordinate space. We’ll call this matrix P, and the sequaence V * M * C * P will tell us which vertexes will be clipped. Once in this space (and after the perspective divide), if x or y coordinates of the vertex are outside [-1,+1] or if the z coordinate is outside [0, +1], then the vertex is clipped.

So, what do we have? Taking a vertex V in object space, multiplying it by M moves it to world space, multiplying that by C moves it to camera space, multiplying it P moves it to projected space. V * M * C * P.

Model View Matrix: Object Space to Camera Space

The matrix created to move a vertex to world space is defined by the various transforms applied to that vertex in OpenGL through functions like glTranslate, glRotate and glScale. Loading on the transforms creates a single matrix to change to world space. It’s also worth noting that the final matrix has an inverse and it is very easy to create if you know the tranforms that were used to create it. Let’s say you translate something by 10, rotate it by 90 and scale it by 0.5. This results in a 4×4 matrix where the original transforms have been muddled together to do the entire series of transforms at once. The inverse of this matrix would have to undo each of those transforms in the proper order. So the inverse can be constructed by scaling by 2.0, rotating by -90 and translating by -10. Easy as pie.

The camera space transform can be easily specified by the function gluLookAt. That link contains the implementation to create the matrix for OpenGL. Here’s a simpler to see implementation using GLSL like C++ types.

mat4 makeGluLookAt(vec3 eye, vec3 center, vec3 up)
{
    // forward pointing vector
    vec3 f(center - eye);
    f.normalize();
    up.normalize();

    // right pointing vector
    vec3 s(f.cross(up));

    // orthonormal up vector
    vec3 u(s.cross(f));

    s.normalize();
    u.normalize();

    // construct orthonormal orientation transfom
    mat4 Orient(
        s.x, s.y, s.z, 0.0f,
        u.x, u.y, u.z, 0.0f,
        -f.z, -f.y, -f.z, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f
    );

    // translate the new coordinate system to the origin
    mat4 Translate(
        1.0f, 0.0f, 0.0f, -eye.x,
        0.0f, 1.0f, 0.0f, -eye.y,
        0.0f, 0.0f, 1.0f, -eye.z,
        0.0f, 0.0f, 0.0f, 1.0f
    );

    return Orient * Translate;
}

If you pay close attention to the Orient matrix, you’ll see that the rows represent the right, up and -forward that will correspond to +X, +Y and +Z (remember that the OpenGL camera looks down -Z). This is a standard change of orientation matrix; this will effectively rotate the world around so that the +X, +Y, and +Z axis line up with the right, up and -forward vectors of the camera. We then translate by the position of the camera to move the old origin to the camera’s origin.

One sticky part to note about all of this is that we have been working with row-wise matrices that are intended to be post multiplied. That is, we start with the vertex, then mutliply by the tranforms, then mutiply by the camera matrix. But, OpenGL uses pre multiplied matrices to achive the same effect (that’s why you clear the model view matrix, then specify the camera, then specify the tranforms and finally the vertex last). In order to end up with the same matrices in the end, OpenGL needs to use transposed matrices, or column-wise matrices. This means that our function needs to be reworked. One important point to remember: the transpose of multiplied matrices is the same as multiplying the transpose of the matrices in reverse order, ie. (A * B)T = BT * AT. Here is the correct OpenGL Friendly code.

mat4 makeGluLookAt(vec3 eye, vec3 center, vec3 up)
{
    vec3 f(center - eye);
    f.normalize();
    up.normalize();

    vec3 s(f.cross(up));
    vec3 u(s.cross(f));

    s.normalize();
    u.normalize();

    mat4 Orient(
        s.x, u.x, -f.x, 0.0f,
        s.y, u.y, -f.y, 0.0f,
        s.z, u.z, -f.z, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f
    );

    mat4 Translate(
        1.0f, 0.0f, 0.0f, 0.0f,
        0.0f, 1.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 1.0f, 0.0f,
        -eye.x, -eye.y, -eye.z, 1.0f
    );

    return Translate * Orient;
}

It is important to realise that OpenGL’s Model View matrix is actually M * C, so, if we want to get the position of the vertex in world space, we will have to use the modelview matrix to go to camera space, then multiply by C-1 (the inverse of the camera matrix). This gives us V * M * C * C-1 = V * gl_ModelViewMatrix * C-1 = V * M. And world space is exactly what we need to do shadow mapping, refraction, reflectiona and a plethora of other things. OpenGL took care of this for us with the GL_ARB_Shadow extension. By using glTexGen, OpenGL computes the inverse camera matrix and applies it without us having to fuss with inverses. But, now we’ll need to do an inverse ourselves.

Inverse gluLookAt

Remeber that the camera matrix is given to us by gluLookAt, and above, we have the exact implementation of how the matrix is created. If you know your matrices, you may notice some properties that the camera matrix adheres to when being constructed this way. Normally, for a generic matrix, the inverse is a slow and painful process. You really don’t want to invert a lot of matrices in order to run your program in real time. Luckily, the camera matrix isn’t just a generic matrix, it has a specific construction with easily invertable properties.

If you paid attention to the construction of the Orient matrix, you would have seen that the columns represent an orthonormal coordinate system. That is, the vectors, s, u and -f are all at 90 degree angles to each other and their lengths are equal to 1. The property that this gives us is that the dot product of any two of those vectors equal 0 when the vectors are different. When the vectors are the same, the dot product is the length of the vector squared, or 1.

Now, imagine a matrix multiply as the dot product of two vectors, the row of the first matrix and the column of the second matrix. What we want to do is construct the inverse of Orient such that Orient * Orient-1 equals the identity matrix (0’s everywhere with 1’s on the diagonals), see where I’m going? As it turns out, the inverse of a matrix that represents an orthonormal coordinate system is exactly it’s transpose.

mat4 Orient(
    s.x, u.x, -f.x, 0.0f,
    s.y, u.y, -f.y, 0.0f,
    s.z, u.z, -f.z, 0.0f,
    0.0f, 0.0f, 0.0f, 1.0f
);

mat4 OrientInverse(
    s.x, s.y, s.z, 0.0f,
    u.x, u.y, u.z, 0.0f,
    -f.x, -f.y, -f.z, 0.0f,
    0.0f, 0.0f, 0.0f, 1.0f
);

But the camera matrix also contains a translation that gets multiplied to Orient. We know the inverse of a translation is just a negative translation, so that’s easy. So, if C = Translate * Orient, then C-1 = (Translate * Orient)-1 = Orient-1 * Translate-1. Hey, we know all those pieces! Let’s put it into OpenGL friendly code.

mat4 makeGluLookAtInverse(vec3 eye, vec3 center, vec3 up)
{
    vec3 f(center - eye);
    f.normalize();
    up.normalize();

    vec3 s(f.cross(up));
    vec3 u(s.cross(f));

    s.normalize();
    u.normalize();

    mat4 OrientInverse(
        s.x, s.y, s.z, 0.0f,
        u.x, u.y, u.z, 0.0f,
        -f.x, -f.y, -f.z, 0.0f,
        0.0f, 0.0f, 0.0f, 1.0f
    );

    mat4 TranslateInverse(
        1.0f, 0.0f, 0.0f, 0.0f,
        0.0f, 1.0f, 0.0f, 0.0f,
        0.0f, 0.0f, 1.0f, 0.0f,
        eye.x, eye.y, eye.z, 1.0f
    );

    return OrientInverse * TranslateInverse;
}

That is probably the most diffucult part of doing shadow mapping and other real time techniques (once you get past all of the theory, that is). And look, it’s all wrapped up in a tiny function! Store that into a texture matrix and your shaders have quick and easy access to world space.

Refraction: Part 1

One Bounce Refraction

So, that engine I was working on; I made quite a bit of progress and decided to try and actual project with it. The project is real time refraction using GLSL. The first step is very quick and easy.

One bounce refraction of an infinite environment.

First, create a skybox in order to create an infinite environment and render it to the screen. Next, enable the environment map (let’s assume it is stored as a texture cube map) and activate the GLSL program. Render the refractive object and disable everything you just enabled. The shaders I used are modified from the Orange Book.

First, the vertex shader. Two varying vec3s i stores the vertex position in eye space and n stores the vertex normal in eye space.

varying vec3 i;
varying vec3 n;

void main()
{
  vec4 ecPosition  = gl_ModelViewMatrix * gl_Vertex;

  i = ecPosition.xyz / ecPosition.w;
  n = gl_NormalMatrix * gl_Normal;

  gl_Position = ftransform();
}

Now, the fragment shader. Using i and n we find the vector that refracts off the surface of the object. This vector is multiplied by gl_TextureMatrix[0] which holds the inverted modelview matrix. This converts the refracted vector from eye space into world space. Using that vector to index into the cube map gives us our final color.

uniform samplerCube texture;
uniform float indexOfRefraction;

varying vec3 i;
varying vec3 n;

void main()
{
  i = normalize(i);
  n = normalize(n);

  vec3 Refracted = refract(i, n, indexOfRefraction);
  Refracted = vec3(gl_TextureMatrix[0] * vec4(RefractR, 1.0));

  vec3 refractColor = vec3(textureCube(texture, RefractR));

  gl_FragColor   = vec4(refractColor, 1.0);
}

If you look at the Orange Book example, you see a few extra features to make the refracting object look more realistic. The first is the Fresnel Effect. This is when you view the refracting object at such an angle that you will actually see a reflection instead. Next is diffraction, or chromatic abberation. We boil it down to supplying a slightly different index of refraction for each color channel.

The vertex shader stays the same, but the fragment shader changes slightly. We find a different refraction vector for each color channel, and also a reflection vector. Look up the colors and mix them together based on the fresnel factor.

uniform samplerCube texture;
uniform vec4 indexOfRefraction; // {R, G, B, Fresnel}

varying vec3 i;
varying vec3 n;

const float FresnelPower = 5.0;

void main()
{
  i = normalize(i);
  n = normalize(n);

  float Ratio   = indexOfRefraction.a + (1.0 - indexOfRefraction.a) * pow((1.0 - dot(-i, n)), FresnelPower);

  vec3 RefractR = refract(i, n, indexOfRefraction.r);
  RefractR = vec3(gl_TextureMatrix[0] * vec4(RefractR, 1.0));

  vec3 RefractG = refract(i, n, indexOfRefraction.g);
  RefractG = vec3(gl_TextureMatrix[0] * vec4(RefractG, 1.0));

  vec3 RefractB = refract(i, n, indexOfRefraction.b);
  RefractB = vec3(gl_TextureMatrix[0] * vec4(RefractB, 1.0));

  vec3 Reflect  = reflect(i, n);
  Reflect  = vec3(gl_TextureMatrix[0] * vec4(Reflect, 1.0));

  vec3 refractColor, reflectColor;

  refractColor.r = vec3(textureCube(texture, RefractR)).r;
  refractColor.g = vec3(textureCube(texture, RefractG)).g;
  refractColor.b = vec3(textureCube(texture, RefractB)).b;

  reflectColor   = vec3(textureCube(texture, Reflect));

  vec3 color     = mix(refractColor, reflectColor, Ratio);

  gl_FragColor   = vec4(color, 1.0);
}

NVidia DualTV + nForce 2

Oi, what a pain. NVidia’s DualTV Tuner does not like to play well with the latest nForce 2 drivers. Random restarts occur when trying to use it. Solution: Use the latest DualTV drivers and roll back to version 4.27 of the nForce 2 drivers.

GL_ARB_texture_non_power_of_two

If the GL_ARB_texture_non_power_of_two extension is supported on your system, then you will be able to use textures of any size (within the limits of the system). No functions or constants are added with this extension.

Not exactly the most complicated of extensions.

GL_EXT_shadow_funcs

The GL_EXT_shadow_funcs extension requires GL_ARB_shadow and GL_ARB_depth_texture in order to have any effect. In the GL_ARB_shadow example on this site we use GL_LEQUAL when setting the GL_TEXTURE_COMPARE_FUNC_ARB texture parameter. GL_ARB_shadow allows this comparison function to be GL_GEQUAL as well, but not any of the other comparison functions.

This extension allows us to use GL_GREATER or GL_EQUAL or any of the eight comparison functions given to us by OpenGL. However, these other functions will not show you much difference. GL_LEQUAL and GL_LESS will look very similar, GL_EQUAL will most likely not be worth using, nor would any of the other comparison functions that this extension gives you. The reasons why you won’t see any difference are given in the extension specifications:

Are there issues with GL_EQUAL and GL_NOTEQUAL?

The GL_EQUAL mode (and GL_NOTEQUAL) may be difficult to obtain well-defined behavior from. This is because there is no guarantee that the divide done by the shadow mapping r/q division is going to exactly match the z/w perspective divide and depth range scale & bias used to generate depth values. Perhaps it can work in a well-defined manner in orthographic views or if you can guarantee that the texture hardware’s r/q is computed with the same hardware used to compute z/w (NVIDIA’s NV_texture_shader extension can provide such a guarantee).

Similiarly, GL_LESS and GL_GREATER are only different from GL_LEQUAL and GL_GEQUAL respectively by a single unit of depth precision which may make the difference between these modes very subtle.

So, unless you are using very specific hardware, you will most likely never need these extra functions.

GL_ARB_shadow

The GL_ARB_shadow extension allows us to create simple shadow maps. It requires the GL_ARB_depth_texture extension. For more information on the theory and implementation of shadow maps see NVidia’s hardware shadow maps document.

The theory behind using shadow maps is very simple and can be summed up in one sentence. Any point that the light can not see is in shadow. To implement this, we take a picture of the scene from the light’s point of view, then compare it to the picture of the scene from the camera’s point of view. Because we never use the color information from the light’s point of view, we can just take the depth information and store it in a texture to compare with later. This is where we need the depth texture. Note that a single depth texture can not cover the entire scene; to do this one would need to use multiple textures. Typically depth textures are used with spotlights, or in games like Guild Wars and Battlefield 2, a depth texture is used on each model that needs a shadow.

In order to speed things up, we break the world into two parts, shadow casters and shadow recievers. When we take the depth texture we make sure to only render the shadow casters. Then we render the entire scene from the camera’s viewpoint, when coloring each pixel, the 3D location of the point in camera space is converted into a 3D point in light space using a well defined method. This, in essence, will give us the distance to the light. If this distance is larger than the distance in the shadow map, then this point is in shadow, and colored appropriatly, otherwise it is colored normally.

In order to convert a point in camera space to a point in light space we simply multiply it by the various matrices that we use to render it to the screen. First, an overview. When we send a vertex into OpenGL using glVertex, it is multiplied by the camera’s modelview matrix MC. In order to get the vertex back, we must multiply by the inverse of the camera’s modelview MC-1. If everything is set up correctly, the result is the same vertex sent into OpenGL when creating the depth texture. The result is then multiplied by the light’s modelview matrix ML and the light’s projection matrix PL. This gives us the exact same result as the depth texture with one caveat. The projected points go from -1 to 1, whereas textures go from 0 to 1. This is easily overcome by scaling the result by half, then translated by 0.5. This is accomplished with the bias matrix B. Now, the point that we are considering is in texture space of the shadow map. If the input was {x, y, z, w} and the output is {s, t, r, q}, then the value of the shadow map at {s, t} is the depth that the light sees, but r is the distance from the point in quiestion to the light. So if we compare the depth in the texture to r, we know whether or not the point is in shadow. This is the reasoning for the texture property added by the extension. We will enable glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_COMPARE_MODE_ARB, GL_COMPARE_R_TO_TEXTURE_ARB). The way we compare is to say a point is not in shadow when the depth is less than or equal to r, it is in shadow otherwise. Hence glTexParameteri(GL_TEXTURE_2D, GL_TEXTURE_COMPARE_FUNC_ARB, GL_LEQUAL).

The matrices above are easily found. ML is given as the the matrix from the gluLookAt command, PL is given with whatever you used to set up the projection matrix (in our example, we keep the projection matrix the same between the light and camera views, so we just grab the current projection matrix). The bias matrix B is well defined and set up in the code below. The only tricky part is finding the inverse of the camera’s modelview matrix. It turns out that if we enable texture coordinate generation with GL_EYE_LINEAR, and GL_EYE_PLANE, OpenGL supplies the inverse for us.

Read more about GL_ARB_shadow