PrtPerVertex Direct3D Sample

This sample demonstrates how to use PrtEngine, a precomputed radiance transfer (PRT) simulator that uses low-order spherical harmonics (SH). The sample also demonstrates how to use these results to accomplish dynamic light transport using a dynamic lighting environment and a vs_1_1 vertex shader.

PrtPerVertex sample

Supported Languages

Path

Source:	(SDK root)\Samples\Managed\Direct3D\PrtPerVertex
Executable:	(SDK root)\Samples\Managed\Direct3D\Bin\x86\csPrtPerVertex.exe

Why is this sample interesting?

PRT using low-order SH basis functions has a number of advantages over typical diffuse (N * L) lighting. Area light sources and global effects such as interreflections, soft shadows, self shadowing, and subsurface scattering can be rendered in real time after a precomputed light transport simulation. Clustered principal component analysis (CPCA) allows the results of the simulator to be compressed so the shader does not need as many constants or per-vertex data.

Overview

The basic idea is first to run a PRT simulator offline as part of the art content creation process and to save the compressed results for later real-time use. The light transport simulator models global effects that would typically be very difficult to do in real time. The real-time engine evaluates the lights in terms of SH basis functions and sums them up into a single set of SH basis coefficients describing the entire lighting environment. It then uses a vertex shader to arrive at the vertex's diffuse color by combining the compressed simulator results and the lighting environment. Because the offline simulator did the work of computing the interreflections and soft shadows, this technique is visually impressive, efficient, and can be used for real-time lighting.

How the Sample Works

The sample performs both the offline and real time portions of PRT. The startup dialog box asks the user which step to perform. The user can run the offline simulator or view a mesh using previously saved results from the PRT simulator. The offline step would typically be done in a separate tool, but this sample does both in the same executable file.

Step 1: Offline Processing

The first step is to run the offline per vertex PRT simulator in the PRTSimulator.cs source file. This code accepts a number of parameters to control the operation of the simulator, an array of meshes, and an array of SphericalHarmonicMaterial structures. One material is allowed per mesh, so each mesh is assumed to be homogenous. The simulator's input parameters and the members of the SH material structure are explained extensively by the sample dialog's tooltips. If you want to pass in more than one mesh, before you do so the meshes need to be transformed into the same coordinate space.

Most of the simulator input parameters do not affect how the results are used. In particular sample's Order parameter does affect how to use the results, specifying the order of SH basis functions that are used to approximate transferred radiance.

In addition to the Order parameter, the spectral parameter (spectralCB) also affects the results. If spectral simulation is chosen, there will be three color channels - red, green, and blue. However, sometimes it is useful to work just with one channel (when modeling shadows, for example). If you choose non-spectral simulation, you simply use the red channel when calling the SH functions, because the other channels are optional.

The simulator will run for a period of time, typically minutes, that depends on the complexity of the meshes, the number of rays, and simulation settings. The output is a GraphicsStream object that contains an internal header and an array of float values for each vertex of the mesh.

The float values for each vertex, called radiance transfer vectors, can be used by a vertex shader to transform source radiance into exit radiance. However, since there are Order² transfer coefficients per channel, then with spectral and Order = 6 there would be 3 x 36 or 108 scalars per vertex. Fortunately, you can compress this large number of scalars with the CPCA algorithm. The number of coefficients per vertex will be reduced to the number of principal component analysis (PCA) vectors, and this number does not need to be large for good results. For example, four or eight usually yields good results. For example, if there are eight PCA vectors and Order = 6, then you will only need eight coefficients per vertex instead of 108. The number of PCA vectors must be less than Order².

Step 2: Rendering in Real Time

The equation to render compressed PRT data is:

PRT rendering equation

where:

Parameter	Description
R_p	A single channel of exit radiance at vertex p and is evaluated at every vertex on the mesh.
M_k	The mean for cluster k. This is an Order² vector of coefficients.
k	The cluster identifier (ID) for vertex p.
L'	The approximation of the source radiance into the SH basis functions. This is an Order² vector of coefficients.
j	An integer that sums over the number of PCA vectors.
N	The number of PCA vectors.
w_pj	The jth PCA weight for point p. This is a single coefficient.
B_kj	The jth PCA basis vector for cluster k. This is an Order² vector of coefficients.

The sample's PRTMesh.cs source file collects all of the data needed for this equation and passes the appropriate data to a vertex shader that implements the equation, as follows:

The sample's prtMesh.LoadMesh method is used to load the mesh, but since CPCA needs (PrtCompressedBuffer.NumberPcaVectors + 1) scalars per vertex, the mesh is cloned with a VertexElement declaration that provides enough memory to store this data. The addition of one is required to provide the vertex shader an index into an array of cluster data. The sample uses the BlendWeight constant value to store the CPCA data, but this semantic is arbitrary and is chosen because skinning and PRT do not work together. In the initialization of the input vertex data with the VertexElement constructor, for usageIndex = 0, the sample defines DeclarationType as Float1 and uses it to store an index into a constant array. For usage indices 1 through 6, the declaration type is Float4. The sample can therefore store up to 24 PCA weights in the vertex buffer.
The sample loads the simulator's SH PRT results from a file and places this data into a PrtCompressedBuffer object. The sample's prtMesh.CompressBuffer method then calls the PrtCompressedBuffer constructor to apply CPCA using some number of PCA vectors and some number of clusters. The output is a PrtCompressedBuffer object called prtCompBuffer that contains the data needed for the above equation. By means of the ReloadState method you can select the number of PCA vectors and clusters without running the simulator again.
The sample extracts CPCA data from prtCompBuffer by first calling PrtCompressedBuffer.ExtractClusterIDs to get the cluster IDs for the vertices. This method writes to an array of Int32 values, where the cluster ID for vertex N is at clusterIds[N]. An array offset is computed for each vertex and stored in the vertex buffer. The offset is used by the vertex shader to allow it to index directly to the data for the current vertex's cluster. This offset is simply the cluster ID times the stride of the constant array filled with CPCA data.
The sample calls PrtCompressedBuffer.ExtractToMesh with usage = BlendWeight and usageIndexStart = 1 to instruct the method to store the per-vertex PCA weights in the mesh starting at the semantic BlendWeight[1], and continuing with BlendWeight[2] and so on until all the PCA weights have been written. Since the application has defined BlendWeight indices 1 through 6 to be type float4, then if there are 20 PCA vectors, this method will write to BlendWeight indices 1 through 5. Note that these vertex elements do not have to be of type float4; they only need to be signed.
As the rendering equation shows, to calculate the exit radiance in the shader you need per-vertex compressed transfer vectors as well as the lighting environment (also called source radiance) approximated using SH basis functions. Several methods of the SphericalHarmonics class are available to help with this step:
You can use one of these functions to get an array of Order² float output values per channel for each light. Then add these arrays together using SphericalHarmonics.Add to arrive at a single set of Order² SH coefficients per channel that describe the scene's source radiance, which is L' in the rendering equation. Note that these methods take the light direction in object space, so you will typically have to transform the light's direction by the inverse of the world matrix.
The last piece of data the sample needs from the ExtractToMesh method is the cluster mean (M), and the PCA basis vectors (B). The sample stores this data in a large array of float values so that when the lights change it can reevaluate the lights and recompute M * L and B * L. To do this it calls PrtCompressedBuffer.ExtractBasis, which extracts the basis a cluster at a time. Each cluster's basis is comprised of a mean and PCA basis vectors. Therefore the size of the array needed to store all of the cluster bases is:
```
int clusterBasisSize = (numberPcaVectors + 1)
                           * numberCoefficients * numberChannels;
```
Note that one is added to the number of PCA vectors to store the cluster mean. Also note that since both (M_k * L') and (B_kj * L') are constant, the sample calculates these values on the CPU and passes them as constants to the vertex shader. The sample stores this per-vertex data in the vertex buffer.
The sample's PrtMesh.ComputeShaderConstants method performs the M * L and B * L calculations in the rendering equation and stores the computed CPCA constant values in the prtConstants array, defined as:
```
prtConstants = new float[ numberClusters
                   * (4 + numberChannels * numberPcaVectors) ];
```
This array is passed directly to the vertex shader with the Effect.SetValue method. Note that the vertex shader uses float4 because each register can hold four float values, so on the vertex shader side the array is of size:
```
int numberVConsts = numberClusters 
                        * (1 + numberChannels * numberPcaVectors / 4) + 4;
```
Evaluating the lights, and calculating and setting the constant table, are rapid procedures that can be done once or more per frame, but for optimization purposes the sample only evaluates lighting effects when the lights are moved.
Now that the sample has extracted all the data it needs, it can render the scene using SH PRT with CPCA. The render loop uses the effects file PRTPerVertex.fx to render the scene. The sample's vertex shader technique PrtDiffuseVS implements the rendering equation to yield exit radiance.

Limitations

Because this technique uses low order spherical harmonics, the lighting environment is assumed to be of low frequency.
The transfer vectors are precomputed, so the relative spatial relationship of the precomputed scene cannot change. A mesh can therefore be rotated, translated, or scaled since those rigid operations do not change the transfer vectors. However, if the mesh is deformed or skinned, then the rendering will be inaccurate. The same logic also applies for scenes composed of several meshes. For example, if you pass a scene of three meshes to the simulator, the real-time engine could rotate, translate, and scale them all as one, but it could not rotate a single mesh independent of the others without producing inaccurate rendering.
For accurate rendering with this vertex-based technique, the mesh must be highly tessellated. However, the vertex-based technique can run on vs_1_1 hardware, while the texture-based technique requires ps_2_0 hardware.
If you mix meshes that have subsurface scattering with ones that do not, then you may need to scale the transfer coefficients for the subsurface scattered mesh because the scattered light is typically about three times darker. With a single mesh you can simply scale the projected light coefficients. You can scale the transfer coefficients by using PrtEngine.ScaleMeshChunk before compressing the data.

Image Resources

Images of the Uffizi, St. Peter's, Grace, Galileo's Tomb, and RNL Light Probe are Copyright ® 1999 Paul Debevec, and are used with permission. These images are located at:
(SDK root)\Samples\Light Probes
The images are made available in Greg Ward's Radiance high dynamic range image format, Radiance Synthetic Imaging System .
The images were converted to the vertical cross format using HDR Shop: HDR Shop .
For additional images and resources, see Light Probe Image Gallery .

Further Information

For more detail about the math behind PRT, CPCA, and SH, see the following references:

Sloan, Peter-Pike, Jan Kautz, and John Snyder. "Precomputed Radiance Transfer for Real-Time Rendering in Dynamic, Low-Frequency Lighting Environments" . ACM Transactions on Graphics (TOG), Proceedings of the 29th Annual Conference on Computer Graphics and Interactive Techniques (SIGGRAPH), pp. 527-536. New York, NY: ACM Press, 2002.
Sloan, Peter-Pike, Jesse Hall, John Hart, and John Snyder. "Clustered Principal Components for Precomputed Radiance Transfer" . ACM Transactions on Graphics (TOG), Vol. 22, Issue 3 (SIGGRAPH), pp. 382-391. New York, NY: ACM Press, July 2003.
Green, Robin. "Spherical Harmonic Lighting: The Gritty Details" . Game Developers' Conference, San Jose, CA, March 2003.

Feedback? Please provide us with your comments on this topic.
For more help, visit the DirectX Developer Center