With high-level shader language (HLSL), you can program shaders at an algorithm level. The time spent considering hardware details such as register allocation, co-issuing instructions, and register read-port limits is greatly reduced. HLSL also has the advantages of other high level languages such as code re-use, improved readability, and a compiler that will optimize the code.
To generate HLSL shaders, you must first learn the high level shading language. Once you understand the language, you will need to know how to: declare variables and functions, use intrinsic functions, define custom data types and use semantics to connect shader arguments to other shaders and to the pipeline.
Once you learn how to author shaders in HLSL, you will need to learn about API calls so that you can: compile a shader for particular hardware, initialize shader constants, and initialize other pipeline state if necessary. These topics are covered in Writing HLSL Shaders. Beyond this, you can reuse your shader code by learning how to:
HLSL uses two special types, a vector type and a matrix type to make programming 2D and 3D graphics easier. Each of these types contain more than one component; a vector contains up to four components, and a matrix contains up to 16 components. When vectors and matrices are used in standard HLSL equations, the math performed is designed to work per-component. For instance, HLSL implements this multiply:
float4 v = a*b;
as a four-component multiply. The result is four scalars:
float4 v = a*b; v.x = a.x*b.x; v.y = a.y*b.y; v.z = a.z*b.z; v.w = a.w*a.w;
This is four multiplications where each result is stored in a separate component of v. This is called a four-component multiply. HLSL uses component math which makes writing shaders in HLSL very efficient.
This is very different from a multiply which is typically implemented as a dot product which generates a single scalar:
v = a.x*b.x + a.y*b.y + a.z*b.z + a.w*b.w;
A matrix also use per-component operations in HLSL:
float3x3 mat1,mat2; ... float3x3 mat3 = mat1*mat2;
The result is a per-component multiply of the two matrices (as opposed to a standard 3x3 matrix multiply). A per component matrix multiply yields this first term:
mat3.m00 = mat1.m00 * mat2._m00;
This is different from a 3x3 matrix multiply which would yield this first term:
// First component of a four-component matrix multiply mat.m00 = mat1._m00 * mat2._m00 + mat1._m01 * mat2._m10 + mat1._m02 * mat2._m20 + mat1._m03 * mat2._m30;
Overloaded versions of the multiply intrinsic function handle cases where one operand is a vector and the other operand is a matrix. Such as: vector * vector, vector * matrix, matrix * vector, and matrix * matrix. For instance:
float4x3 World; float4 main(float4 pos : POSITION) : POSITION { float4 val; val.xyz = mul(pos,World); val.w = 0; return val; }
produces the same result as:
float4x3 World; float4 main(float4 pos : POSITION) : POSITION { float4 val; val.xyz = (float3) mul((float1x4)pos,World); val.w = 0; return val; }
This example casts the "pos" vector to a column vector using the "(float1x4)" cast. Changing a vector by casting, or swapping the order of the arguments supplied to mul - HLSL is equivalent to transposing the matrix.
Automatic cast conversion causes the mul - HLSL and dot - HLSL intrinsic functions to return the same results as used here:
{ float4 val; return mul(val,val); }
This result of the mul - HLSL is a 1x4 * 4x1 = 1x1 vector. This is equivalent to a dot product:
{ float4 val; return dot(val,val); }
which returns a single scalar value.
A vector is a data structure that contains between one and four components.
bool bVector; // scalar containing 1 Boolean bool1 bVector; // vector containing 1 Boolean int1 iVector; // vector containing 1 int half2 hVector; // vector containing 2 halfs float3 fVector; // vector containing 3 floats double4 dVector; // vector containing 4 doubles
The integer immediately following the data type is the number of components on the vector.
Initializers can also be included in the declarations.
bool bVector = false; int1 iVector = 1; half2 hVector = { 0.2, 0.3 }; float3 fVector = { 0.2f, 0.3f, 0.4f }; double4 dVector = { 0.2, 0.3, 0.4, 0.5 };
Alternatively, the vector type can be used to make the same declarations:
vector <bool, 1> bVector = false; vector <int, 1> iVector = 1; vector <half, 2> hVector = { 0.2, 0.3 }; vector <float, 3> fVector = { 0.2f, 0.3f, 0.4f }; vector <double, 4> dVector = { 0.2, 0.3, 0.4, 0.5 };
The vector type uses angle brackets to specify the type and number of components.
Vectors contain up to four components, each of which can be accessed using one of two naming sets:
These statements both return the value in the third component.
// Given float4 pos = float4(0,0,2,1); pos.z // value is 2 pos.b // value is 2
Naming sets can use one or more components, but they cannot be mixed.
// Given float4 pos = float4(0,0,2,1); float2 temp; temp = pos.xy // valid temp = pos.rg // valid temp = pos.xg // NOT VALID because the position and color sets were used.
Specifying one or more vector components when reading components is called swizzling. For example:
float4 pos = float4(0,0,2,1); float2 f_2D; f_2D = pos.xy; // read two components f_2D = pos.xz; // read components in any order f_2D = pos.zx; f_2D = pos.xx; // components can be read more than once f_2D = pos.yy;
Masking controls how many components are written.
float4 pos = float4(0,0,2,1); float4 f_4D; f_4D = pos; // write four components f_4D.xz = pos.xz; // write two components f_4D.zx = pos.xz; // change the write order f_4D.xzyw = pos.w; // write one component to more than one component f_4D.wzyx = pos;
Assignments cannot be written to the same component more than once. So the left side of this statement is invalid:
f_4D.xx = pos.xy; // cannot write to the same destination components
Also, the component name spaces cannot be mixed. This is an invalid component write:
f_4D.xg = pos.rgrg; // invalid write: cannot mix component name spaces
A matrix is a data structure that contains rows and columns of data. The data can be any of the scalar data types, however, every element of a matrix is the same data type. The number of rows and columns is specified with the "row by column" string that is appended to the data type.
int1x1 iMatrix; // integer matrix with 1 row, 1 column int2x1 iMatrix; // integer matrix with 2 rows, 1 column ... int4x1 iMatrix; // integer matrix with 4 rows, 1 column ... int1x4 iMatrix; // integer matrix with 1 row, 4 columns double1x1 dMatrix; // double matrix with 1 row, 1 column double2x2 dMatrix; // double matrix with 2 rows, 2 columns double3x3 dMatrix; // double matrix with 3 rows, 3 columns double4x4 dMatrix; // double matrix with 4 rows, 4 columns
The maximum number of rows or columns is 4; the minimum number is 1.
A matrix can be initialized when it is declared:
float2x2 fMatrix = { 0.0f, 0.1, // row 1 2.1f, 2.2f // row 2 };
Or, the matrix type can be used to make the same declarations:
matrix <float, 2, 2> fMatrix = { 0.0f, 0.1, // row 1 2.1f, 2.2f // row 2 };
The matrix type uses the angle brackets to specify the type, the number of rows, and the number of columns. This example creates a floating-point matrix, with two rows and two columns. Any of the scalar data types can be used.
This declaration defines a matrix of half values (16-bit floating-point numbers) with two rows and three columns:
matrix <half, 2, 3> fHalfMatrix;
A matrix contains values organized in rows and columns, which can be accessed using the structure operator "." followed by one of two naming sets:
Each naming set starts with an underscore followed by the row number and the column number. The zero-based convention also includes the letter "m" before the row and column number. Here's an example that uses the two naming sets to access a matrix:
// given float2x2 fMatrix = { 1.0f, 1.1f, // row 1 2.0f, 2.1f // row 2 }; float f_1D; f_1D = matrix._m00; // read the value in row 1, column 1: 1.0 f_1D = matrix._m11; // read the value in row 2, column 2: 2.1 f_1D = matrix._11; // read the value in row 1, column 1: 1.0 f_1D = matrix._22; // read the value in row 2, column 2: 2.1
Just like vectors, naming sets can use one or more components from either naming set.
// Given float2x2 fMatrix = { 1.0f, 1.1f, // row 1 2.0f, 2.1f // row 2 }; float2 temp; temp = fMatrix._m00_m11 // valid temp = fMatrix._m11_m00 // valid temp = fMatrix._11_22 // valid temp = fMatrix._22_11 // valid
A matrix can also be accessed using array access notation, which is a zero-based set of indices. Each index is inside of square brackets. A 4x4 matrix is accessed with the following indices:
Here is an example of accessing a matrix:
float2x2 fMatrix = { 1.0f, 1.1f, // row 1 2.0f, 2.1f // row 2 }; float temp; temp = fMatrix[0][0] // single component read temp = fMatrix[0][1] // single component read
Notice that the structure operator "." is not used to access an array. Array access notation cannot use swizzling to read more than one component.
float2 temp; temp = fMatrix[0][0]_[0][1] // invalid, cannot read two components
However, array accessing can read a multi-component vector.
float2 temp; float2x2 fMatrix; temp = fMatrix[0] // read the first row
As with vectors, reading more than one matrix component is called swizzling. More than one component can be assigned, assuming only one name space is used. These are all valid assignments:
// Given these variables float4x4 worldMatrix = float4( {0,0,0,0}, {1,1,1,1}, {2,2,2,2}, {3,3,3,3} ); float4x4 tempMatrix; tempMatrix._m00_m11 = worldMatrix._m00_m11; // multiple components tempMatrix._m00_m11 = worldMatrix.m13_m23; tempMatrix._11_22_33 = worldMatrix._11_22_33; // any order on swizzles tempMatrix._11_22_33 = worldMatrix._24_23_22;
Masking controls how many components are written.
// Given float4x4 worldMatrix = float4( {0,0,0,0}, {1,1,1,1}, {2,2,2,2}, {3,3,3,3} ); float4x4 tempMatrix; tempMatrix._m00_m11 = worldMatrix._m00_m11; // write two components tempMatrix._m23_m00 = worldMatrix.m00_m11;
Assignments cannot be written to the same component more than once. So the left side of this statement is invalid:
// cannot write to the same component more than once tempMatrix._m00_m00 = worldMatrix.m00_m11;
Also, the component name spaces cannot be mixed. This is an invalid component write:
// Invalid use of same component on left side tempMatrix._11_m23 = worldMatrix._11_22;
Matrix packing order for uniform parameters is set to column-major by default. This means each column of the matrix is stored in a single constant register. On the other hand, a row-major matrix packs each row of the matrix in a single constant register. Matrix packing can be changed with the "#pragma pack_matrix" directive, or with the "row_major" or the "col_major" keywords.
In general, column-major matrices are more efficient than row-major matrices. Here is an example that compares the number of instructions used for both column-major and row-major matrices:
// column-major matrix packing float4x3 World; float4 main(float4 pos : POSITION) : POSITION { float4 val; val.xyz = mul(pos,World); val.w = 0; return val; }
If you look at the assembly code generated from the HLSL compiler, you will see these instructions:
vs_2_0 def c3, 0, 0, 0, 0 dcl_position v0 m4x3 oPos.xyz, v0, c0 mov oPos.w, c3.x // approximately four instruction slots used
Using a column-major matrix in this example generated four assembly-language instructions.
Here is the same example using a row-major matrix:
// row-major matrix packing #pragma pack_matrix(row_major) float4x3 World; float4 main(float4 pos : POSITION) : POSITION { float4 val; val.xyz = mul(pos,World); val.w = 0; return val; }
The assembly code generated from compiling this HLSL code is:
vs_2_0 def c4, 0, 0, 0, 0 dcl_position v0 mul r0.xyz, v0.x, c0 mad r2.xyz, v0.y, c1, r0 mad r4.xyz, v0.z, c2, r2 mad oPos.xyz, v0.w, c3, r4 mov oPos.w, c4.x // approximately five instruction slots used
This generated five instruction slots. In this example, writing the same code with a column-major packing order saved one instruction out of five. In addition to saving instruction slots, column-major packing usually saves constant register space.
The data in a matrix is loaded into shader constant registers before a shader runs. There are two choices for how the matrix data is read: in row-major order or in column-major order. Column-major order means that each matrix column will be stored in a single constant register, and row-major order means that each row of the matrix will be stored in a single constant register. This is an important consideration for how many constant registers are used for a matrix.
A row-major matrix is laid out like this:
11 | 12 | 13 | 14 |
21 | 22 | 23 | 24 |
31 | 32 | 33 | 34 |
41 | 42 | 43 | 44 |
A column-major matrix is laid out like this:
11 | 21 | 31 | 41 |
12 | 22 | 32 | 42 |
13 | 23 | 33 | 43 |
14 | 24 | 34 | 44 |
Row-major and column-major matrix ordering determine the order the matrix components are read from the constant table or from shader inputs. Once the data is written into constant registers, matrix order has no effect on how the data is used or accessed from within shader code. Also, matrices declared in a shader body do not get packed into constant registers. Row-major and column-major packing order has no influence on the packing order of constructors (which always follows row-major ordering).
The order of the data in a matrix can be declared at compile time (see Type Modifiers, or the compiler will order the data at runtime for the most efficient use.