GPUs from AMD and NVIDIA, is the future without RT Cores for RT?

RT Cores, Ray Accelerator Units or Intersection Units are specialized units which are in charge of a single task in GPUs and which first came from the hand of the first NVIDIA RTX.

In this article we will not explain what they are used for, for this we recommend that you search for the article in HardZone titled What are and how do RT Cores for Ray Tracing work? in which we explain in a simple but detailed way how this type of unit works.

What are RT Cores or Intersection Units?

The RT Cores in NVIDIA or the Ray Accelerator Units in AMD are units responsible for calculating the intersection between the rays and the different elements of the scene, to understand what is the need for this type of unit in the hardware of the new cards Graphics we need to understand how the simpler version of the ray tracing algorithm works:

For each pixel or object in which the pixel is located, if the ray intersects this object: change the color value of that pixel on the screen.

This is done on a continuous and repetitive basis in each of the frames that the GPU renders that are generated using the ray tracing algorithm or one of its variations, either in part to solve the problems of indirect lighting that rasterization cannot resolve by itself.

The Möller – Trumbore algorithm for the intersection between rays and triangles

Radius intersection units are fixed function units that perform Moller-Trumbore algorithm. It must be taken into account that what fixed function units do is always to apply the same program from certain input data, said program is micro-wired, so that the transistors that make up said unit are placed in such a way that they can only run one program and not another.

The advantage of fixed function units is that they require fewer transistors than programmable units which are much more complex, but a fixed function unit only makes sense in hardware where programmable units dominate if it can perform its task at some point. and the speed level cannot be matched by the programmable part.

Obviously, like any algorithm, it is possible to run it in shader units, but for this to be possible said units would have to be fast enough to do without fixed function units.

The cost of the Möller – Trumbore algorithm

Despite the fact that there are more algorithms, it is the best known and most used, which is why we decided to use it as an example and believe me that its cost is not directly good market since there are a total of 27 floating point operations per pixel. But, in some architectures, because the division is more complex to implement in shaders, it is not performed by conventional SIMD units but by SFUs, which can perform much more complex arithmetic operations but with a lower speed than sums and multiplications.

In other words, we would need 27 FLOPS not per pixel but per pixel and intersection, now think about the number of intersections and pixels in a scene and you will have a rough idea of why the intersection units or RT cores are needed.

The type of shader program that replaces RT Cores

In the API specification for Real-Time Ray Tracing, both in DXR in DX12 Ultimate and in Ray Tracing Extensions for Vulkan, there is a type of shader that has become obsolete, which is the Intersection Shader, which it completely replaces to intersection units. in hardware where they are not present.

Keep in mind that a shader is nothing more than a program, and the fact that programmers have to create their own intersection unit game by game can be a hassle, which is why both APIs include examples of intersection shaders. The compromise for this? Many developers may view the intersection algorithm included in APIs as well as the fixed function units as inappropriate.

In the hardware design, it is not usual to eliminate fixed functional units that work as accelerators, but rather it is usual to expand the capabilities of said units and even to make these units programmable, so the next step in the process is The evolution of intersection units, if it has not already been done, is for a specific area with a micro-programmed code that can be updated.

It is therefore possible that we will witness the creation of new intersection algorithms with better performance, which will eventually be written to the internal memory of each of the units with a firmware update.

Fixed Function Units have never been removed from a GPU

A GPU has a series of fixed function units to render 3D graphics, these units, like the intersection units, are responsible for performing repetitive and repetitive tasks in each frame. We are referring to units such as texture units, those responsible for rasterizing geometry, etc.

These units have never been eliminated due to the fact that their tasks can be performed by a shader unit, moreover, if we took a GPU without said fixed units and had them render a scene in 3D, they would be on the order of size more inefficient than a GPU with fewer shader units but with those units included.

The tendency is always to appear a repetitive and repetitive part in each frame, which would occupy a good part of the time and resources of the units that run the shaders, since it ends up creating a type of specialized unit that not only unloads of said task to these units but to do it more quickly and at part of the cost.