The symbiosis between DirectX and hardware designers is clear, the Microsoft API allows communication with different components such as GPU from an abstract model thereof and allows developers to implement the latest improvements in their games. Either for better performance or better graphics and sound.
What do we mean by Shader Model?
Shaders are programs that run in the cores of GPUs and change the values of a graphics primitive at a certain point in 3D or in the data pipeline if we are talking about computing through GPUs. These programs are written in a high-level language which, in the case of Microsoft’s DirectX, is called HLSL.
As GPUs improve, new HLSL functions are added that allow the use of new functions of a GPU, some of them correspond to future designs from NVIDIA, Intel and AMD which do not have not yet been implemented in a commercial GPU, but if it comes to to go to the market.
What has been added in Shader Model 6.6?
GPU cores, known as shader units in general, although with names like SM by NVIDIA and Compute Unit by AMD, are in charge of running programs. These come in the form of kernels where each kernel is data and an instruction to be executed. The data can come from the kernel itself, it can be a memory pointer, or it can depend on the computation of a previous kernel.
The cores are grouped into waves and depending on their size, the level of occupation in the shader unit registers will be higher or lower. What does that mean? Well, at the level of use of the shader unit and therefore of its performance, because this can lead to not using all the power of the GPU for the calculation. Many times the size of a wave does not reach all registers, resulting in the loss of part of the performance.
The change of the Model 6.6 shader? It will now be possible to create waves of variable size, which will make it possible to fill the gaps which are not used and thus to use all the ALUs of each SIMD unit, thus allowing better use of the GPUs.
Is it for current GPUs?
In DirectX, things that cannot be used with the hardware on the market are usually not added, so we can assume that at least there is an NVIDIA architecture. Intel or AMD who can take advantage of this novelty. This, yes, the code of the games must be optimized, so don’t expect it to be added to any games coming out this year, because it’s a feature Microsoft just added. Although this will help in creating optimized profiles of existing games on GPUs that can run multiple waves per SIMD unit.
It may be that NVIDIA and AMD have made a priori changes in their RTX 30 and RDNA 2. As with the DirectStorage which can be used in the RTX 20. We can therefore find a surprise, even if we cannot exclude Intel with its Intel Xe-HPG which should appear this year.
To allow the use of waves of variable size by the program, it is necessary to change the scheduler or the control unit of the shader unit. At the moment, it is believed that they use only one wave per SIMD unit, so if there are not enough occupied registers, then there is not enough ALU. With this change, if for example a GPU supports a wave of 32 cores, then we can have one wave of 24 components and another of 8 running at the same time. This change doesn’t make a GPU any faster than it is, but it does make it run closer to 100%, that, yes, with the right code.