Ray Tracing has become one of the technological innovations in terms of graphics, especially as NVIDIA in the RTX 2000 family has added hardware to speed up real-time ray tracing, a trend it recently joined AMD with its RX 6000 range and even NVIDIA with its RTX 3000.
But things are not even between the NVIDIA RTX 3000 and the AMD Radeon RX 6000 is not on par when it comes to ray tracing, in part this can be explained by the higher number of ALUs in FP32 that have NVIDIA GPU cores, but that’s only part of the story.
What’s wrong with ray tracing on AMD RX 6000?
One of the key points to speed up ray tracing is the use of acceleration data structures, what they do is store a map of the position of objects in the scene.
How useful are they? Simple, in Ray Tracing, they prevent the rays from being launched and testing towards parts of the scene where there is nothing, so they save a lot of time and are therefore called acceleration structures, of which there is no there is not a single type, but several different ones. those.
In the case of NVIDIA, they decided to add in their RT cores a unit capable of traversing one type of data structure, BVH trees, that means if we use that data structure when ray tracing in our games , we won’t have to invoke a Calculate Shader program to perform the walkthrough.
But in the case of AMD, they decided not to give preference to any type of acceleration structure, which means that the path must be controlled by a computational shader program in the case of using structures classic tree data such as Octrees, BVH, KD-Trees, etc.
A simple explanation of what a tree is
In computer science, trees are not an ordered and listed data structure but rather hierarchical, which means that when it comes to traversing them, the processor will have to iterate through several iterations.
- The node where the tree begins is called the root.
- Any node that has one or more nodes below it is called a parent.
- Each node that has a node above it in the hierarchy is called a child.
- Each node that is at the end of the hierarchy is called a leaf.
Trees should not be confused with conditional jumps in code which are based on jumping to one line or another when a condition occurs, trees assume that when there are multiple nodes, it is better that they are focused by several different execution threads at each iteration.
Contemporary GPUs usually have Shader units made up of 4 SIMD ALUs, where each of them runs an execution thread, so they can run trees with up to 4 nodes without a problem, of course, when it starts. to go through a node, then there will be more and more subnodes so the number of threads to be executed will be very high.
This is why NVIDIA has added specialized hardware in BVH tree traversal in its RT cores, to avoid not having to use shader units for this, however this unit only works for this type of data structure. , but in return it can traverse this data structure very quickly.
But there is a way to present the data of a node and that is to present the different routes in a linear way, it allows to send the data in a one dimensional array which is what a 1D texture is, which is the best way to send data to AMD GPUs.
The solution from AMD’s part is that the developers forget to present the acceleration structure as a texture, of course that comes from the decision they made not to add specialized hardware to traverse a specific type of tree, giving preference to greater versatility instead of doing so at greater speed.
This means that developers must currently adopt specific measures for each brand of graphics cards when implementing ray tracing.
Where does the gap come from?
Some of you might be wondering why AMD decided not to include hardware to traverse the data structure and it’s very simple, it’s not part of the minimum DirectX Ray Tracing specification.
Also, in DXR we can do ray tracing by replacing the intersection units with Shader units that perform an Intersection Shader, but the specialized intersection units that AMD and NVIDIA have included are much more efficient because they get the job done several times faster taking up only a portion in comparison.
What we are referring to is that Microsoft, when building its API, did not put how hardware should work under the table and this gave AMD the option of doing without specialized hardware. to navigate the data structures in the tree, which affected performance. of your graphics cards.
Although the AMD Ray Tracing patent spoke of the inclusion of a unit capable of traversing trees at 4 nodes, BVH-4, it also warned that it was optional and that due to the information that can be obtained at As of the recently released RDNA 2 ISA, there are no references to the unit responsible for traversing trees, only to intersect statements.