True hardware and shader based instancingBy Rim van Wersch, January 5 2006 |
Instancing is a technique for rendering multiple instances of a scene object on the GPU, while keeping the CPU load to a minimum. This tutorial provides two sample projects in which both hardware instancing (SM3) and shader instancing (SM2) are fully implemented. The hardware instancing sample also includes a little hack for enabling instancing on ATI SM2 cards.
How does it work?
Instancing works by sending the data for many object instances to the GPU, so it can render these without having to wait on the CPU. This generally speeds up rendering and reduces the load on the CPU. The two main techniques for this are 'true' hardware instancing and shader instancing.
True hardware instancing works using two vertex streams to the videocard, which allows you to send the vertex data for the objects (just your regular vertexbuffer) and the instance data for the instances (position, color and the like) to the GPU simultaneously. The beauty of hardware instancing is that you can set a flag using device.SetStreamFrequency to indicate which stream contains the vertex data and which the instance data. This way you only have to put the original mesh on the vertex data stream and DirectX will see to it that this data is repeated for each instance you provide data for. This saves memory and is generally easier to work with, since you can use the orignal mesh and you'll only need one DIP call for each subset.
Shader instancing works by creating a vertex buffer that will hold a number of copies of the original mesh. This vertex buffer is then used to draw batches of the original object, while setting the instance data directly on the shader using effect.SetValue. The number of objects drawn in one batch depends on the instance data provided for each instance, as you'll be limited in the number of constants you can set on the hardware (typically 256 for SM2 hardware). For our sample, we're using a single float4 for holding the instance data, so we can safely use a batch size of 200 objects.
The most important concept to notice in the sample projects is that you're responsible for applying the instance data to your instances yourself. It's up to you to write the shader that uses the instance data to render your instances appropriately. This may not be intuitive at first, at least it came as a bit of a surprise to me when I first looked into instancing.
About the sample projects
The samples render 64.000 meshes (tested with simple ones, like boxes, pyramids and cylinders) in a 40x40x40 grid. You might want to lower this number, as 64.000 meshes can be quite hard on your hardware. Constants instancing (a very simple instancing technique) was implemented using the fixed function pipeline to serve as a test render approach.
The sample code provided allows you to specify the translation and the texture for each instance, but it can be easily modified to support more elaborate transformations and additional properties. The sample also provides a simple trick to allow for the use of different textures on various instances without batch sorting. It includes the PointSize hack for enabling instancing on ATI SM2 hardware (set during device reset).
Files for this tutorial
Filename | Size |
? ManagedInstancing - Hardware.zip | 288.1 KB |
? ManagedInstancing - Shader.zip | 289.7 KB |
Further reading