About video memory and lockingBy Simon O'Connor, edited by Rim van Wersch, May 8 2006 |
Video memory can be quite a difficult subject to get comfortable with, especially with all the different terminology in existence for it today. Fortunately for us mortals, has written up a detailed overview on the subject with some common pitfalls and performance implications, which you'll find below.
What exactly is video memory?
Video memory can mean one of two things depending on the context in which the term is used:
- Video memory in general is any memory which is used by the graphics chip.
- Video memory -or more precisely "local video memory"- is memory that physically exists on the graphic card itself, i.e. RAM chips that live on the graphics card, they are 'local' to the graphics chip (GPU).
What is AGP memory?
AGP memory is main memory on your system motherboard that has been specially assigned for graphics use. The "AGP Aperture" setting in your system BIOS controls this assignment. The more you have assigned for AGP use, the less you have for general system use. This reservation typically isn't strictly enforced by the system though and memory is allocated as needed with specific optimizations to make its use more efficient. AGP memory is sometimes also known as "non-local video memory".
What are the performance characteristics for using local video memory and AGP memory?
- 'Local' video memory is very fast for the graphics chip to read from and write to because it is 'local' to the graphics chip.
- 'Local' video memory is extremely slow to read from using for the system CPU, and reasonably slow to write to using the system CPU. This is for a number of reasons; partly because the memory is physically on a different board (the graphics card) to the CPU (i.e. it's not 'local' for the CPU); partly because that memory isn't cached at all for reads using the CPU, and only burst cached for writes; partly due to the way data transfers over bus standards such as AGP or PCI must be done.
- AGP memory is reasonably fast for the graphics chip to read from or write to, but not as fast as local video memory.
- AGP memory is fairly slow to read from using the system CPU because it is marked as "Write Combined" so any reads don't benefit from the L2 and L1 caches (i.e. each read is effectively a cache-miss). AGP memory is however faster than local video memory to read from using the CPU since it is local to the CPU.
- AGP memory is reasonably fast to write to using the system CPU. Although not fully cached, "Write Combined" memory uses a small buffer that collects sequential writes to memory (32 or 64 bytes IIRC) and writes them out in one go. This is why sequential access of vertex data using the CPU is preferable for performance.
How does Usage.Dynamic fit in to all this?
Usage.Dynamic is only a hint to the display driver about how you intend using that resource, usually it will give you AGP memory. This behaviour isn't guaranteed however, so don't rely on it! Generally, for vertex buffers which you need to Lock() and update using the CPU regularly at runtime should be Usage.Dynamic and all others should be static.
Graphics drivers use techniques such as "buffer renaming" where multiple copies of the buffer are created and cycled through to reduce the chance of stalls when dynamic resources are locked. This is why it's essential to use the LockFlags.Discard and LockFlags.NoOverwrite locking flags correctly if you want good performance. It's also one of the many reasons you shouldn't rely on the data obtained from a Lock() call after the resource has been unlocked.
Graphics drivers use techniques such as "buffer renaming" where multiple copies of the buffer are created and cycled through to reduce the chance of stalls when dynamic resources are locked. This is why it's essential to use the LockFlags.Discard and LockFlags.NoOverwrite locking flags correctly if you want good performance. It's also one of the many reasons you shouldn't rely on the data obtained from a Lock() call after the resource has been unlocked.
General advice for good performance
- Treat all graphics resources as write-only for the CPU, particularly those in local video memory. CPU reads from graphics resources is a recipe for slowness.
- CPU writes to locked graphics resources should be done sequentially.
- It's better to write all of a vertex out to memory with the CPU than it is to skip elements of it. Skipping can harm the effectiveness of write combining, and even cause hidden reads in some situations (and reads are bad - see above).
Further reading
-
?