Why does AMD have worse performance in Ray Tracing?

Mathematically, the most common formula to verify it is not a simple operation, but rather a complex equation with vectors, which requires some power. So much so that simply not having a parallel unit tasked with performing this task can reduce the performance percentage to single digits.

Material intersection units

That’s why NVIDIA has the RT Cores and AMD has the Ray Accelerator Units, they are the same, because they are the same type of unit and are used for the same task. However, in the last generation, the RX 6000 had a limitation which fortunately the RTG has solved in RDNA 3 and, therefore, in the RX 7000 range.

What’s the problem then?

The good thing, and therefore the positive, is that now what was missing in RDNA 2 has been included in RDNA 3.
The bad thing and what makes us have poor Ray Tracing performance on AMD is the number of ray-triangle interactions that you can calculate. A jump of just 50% is very poor when your rival has doubled the performance from one generation to the next.

Let’s not forget that the first 3D cards that appeared on the market were responsible for increasingly accelerating the triangular pixelation operation, which is the most common in this regard. The same goes for this part in ray tracing. The fact that AMD has taken such a small leap in this regard is therefore disappointing.

How does this affect overall performance?

Although the intersection of the rays is part of the decor, it is an element common to all the scenes that is essential. Let’s not forget that it is a process that goes through stages where the fact that one goes slower than normal ends up affecting the performance of the following ones.

Therefore, if we manage to speed up a step, we get a shorter time to generate the same frame, i.e. it takes less milliseconds and it makes more frames per second. What should be clear is that the intersection process is recursive and continuous in Ray Tracing and therefore it is necessary for this part to have good performance.

The other problem: floating point performance

GPUs typically operate on blocks of data in unison, applying the same instruction to them. That’s why its quintessential unit type is what we call SIMD units, which, as the name suggests, apply the same instruction to many different pieces of data at the same time. Well, NVIDIA in the RTX 30 has made a rather curious improvement that allows it to calculate twice as many 32-bit floating point operations per clock cycle per core.

The trick was to add a second 16-element SIMD unit on each of the sub-cores for a total of 64 additional operations per unit inside the GPU. However, they did not increase the number of records or hits, because they were switched with the whole number unit. What does this translate to? The RTX 30 and RTX 40 both achieve double floating point performance under some conditions, not always.

AMD on the other hand looked for another solution which they called Dual Issue but in their tech specs they say the number of floating point units did not increase but under certain conditions they can pack 2 instructions at the same time. However, the number of units per core or compute unit is still at a maximum of 64, instead of 128, as in the case of NVIDIA.

What does AMD mean by “Dual Issue” in RDNA 3?

However, if you count the number of floating point operations given by AMD, which are usually given at a theoretical maximum, performing 100% of the time the FMA operation or addition with floating point multiplication, which is unrealistic, because it does not take into account memory accesses and the fact that programs do not always use said instruction, but it does take into account that it is the most used when generating graphics. The fact is that the instruction is 2 operations.

Well, what AMD has done is that certain instructions can be packaged two by two into the compute units, achieving twice the floating point power with RDNA 2 under certain conditions. This is the same case as with NVIDIA GPUs. The extra floating point power is not doubled in general, but only under certain conditions. So it’s a common problem. In any case, the measurement in TFLOPS is still today a marketing trick.

So why does it matter to AMD’s Ray Tracing performance? Well, because it helps us measure the computing power of the units used in the rest of the ray tracing steps that are not the intersection of the rays. Either way, AMD itself claims the cross-generational improvement is 18% at the same clock speed.

AMD GPU Performance in Ray Tracing: The Numbers

If we compare the performance of different intersection units on different generations of graphics cards from NVIDIA and AMD, we will see what the problem is.

GPUs	Intersections/s (in millions)	Cores	MHz	Intersections (core and MHz)
RTX-2080Ti	105600	68	1545	1
RTX-3090Ti	312480	84	1860	2
RTX-4090	1290240	144	2520	3.6
RX 6950 XT	184800	80	2310	1
RX 7900 XTX	360000	96	2500	1.5

At first glance the raw power in this aspect is higher than an RTX 3090 Ti, yes we are looking at the second column. However, it is the latter that is important, as it tells us how many intercepts are being calculated per core and clock cycle on the GPU. And the disappointment comes from the fact that although AMD is not asked to give the result of 3.6 for the RTX 40, it is asked to achieve at least 2 for the RTX 30. This is the main reason for the poor performance of AMD graphics cards. in ray tracing. And the reason we think they could have done so much better.

It’s more, and already to finish, because the Ray Accelerator Unit is a black box in itself which can be replaced without touching the rest of the architecture. AMD can pick up and create an RX 7×50 lineup for the coming year that retains all the benefits of the current RDNA 3, but with the improved RAU and see gaming performance increase by double digit percentages in frame rate.

How are AMD games performing with Ray Tracing in RDNA 3?

Now, finally, we have the icing on the cake and talk about its performance in games. Considering AMD has publicly claimed a 50% improvement, we should expect an equally large jump. However, we later found out that they were referring to performance per watt, a certain amount of it and a specific game, which was not specified. The important thing, then, is to know what the improvement has been compared to the previous generation, in this aspect, especially due to the fact that they start from a rather poor performance in ray tracing that comes from the RX 6000.