How does DLSS work?
In the first two versions of the algorithm, DLSS can take up to 3 additional milliseconds of frame creation time. In the case of DLSS 3, we don’t know, but we assume it’s less due to the higher power of the RTX 40. In any case, DLSS needs the information from the newly generated frame, so if we want the game to work at a frame rate Concretely, we must calculate the frame time, for example 16.67 ms if we want 60 FPS, and subtract from it the time that the graphics card takes to apply it .
Suppose we have a scene that we want to render in 4K. For this we have an indeterminate GeForce RTX which at said resolution reaches 25 frames per second, so it renders each of them at 40 ms, we know that the same GPU can reach a frame rate of 5o, 20 ms at 1080p. Our hypothetical GeForce RTX takes about 2.5ms to go from 1080p to 4K, so if we enable DLSS to get a 4K frame from a 1080p frame, each frame with DLSS will take 22.5ms. With this, we managed to render the scene at 44 frames per second, which is higher than the 25 frames that would be obtained by rendering the resolution native.
What happens if the graphics card does not have enough power?
Each NVIDIA RTX model has its speed when applying one of the DLSS variants depending on the source resolution and the destination resolution. The chart you see below is taken from NVIDIA’s own documentation, where the increase in screen pixel count resolution is 4x. This therefore corresponds to the so-called Performance mode.
As you can see from the table, the performance varies not only depending on the GPU, but also if we take into account the GPU we are using. Which shouldn’t surprise anyone after what we explained earlier. The fact that in Performance mode an RTX 3090 eventually manages to go from 1080p to 4K in under 1ms is impressive to say the least, but it does mean that DLSS performs better the more powerful the graphics card.
DLSS is not a reconstruction, but a prediction
It’s important to keep this in mind, the reason why it’s an algorithm that requires learning and therefore trial and error is because despite the fact that the frames are generated very quickly, it doesn’t do it enough quickly to skip some of the mistakes. And although the algorithm can be trained not to extract bad pixels from the prediction, the lack of information is fatal. For example, the less resolution the source image has, the lower the quality of the final image obtained.
The other point concerns the geometry of the scene. All GPUs were designed so that the smallest size of an object is 2×2 pixels at most. The consequences on DLSS? Any object smaller than this size is rejected and we have to take into account that objects get smaller and smaller with distance. This means, for example, that a 4K image generated natively and not with DLSS will have additional details.
DLSS 1.0 vs DLSS 2.0, how are they different?
Each version of DLSS is based on the previous one, so DLSS 3 is an evolution of DLSS 2 and the latter of the original version. The first DLSS did not support temporality in the sense that it did not use motion vectors for image reconstruction. What it did was use information from a single frame, hence its much worse image quality. In fact, NVIDIA’s DLSS was a differentiator against the second version competition.
What is optical flow and why is it important for DLSS 2 and 3?
To understand how DLSS 3 works, we first need to know what NVIDIA means by Optical Flow. And it is something that has been used since the RTX 20. Although it is not a piece of hardware, but rather a series of software libraries which are defined as follows:
The NVIDIA Optical Flow SDK exposes the latest hardware capabilities of the Turing, Ampere, and Ada architectures dedicated to calculating pixel motion between frames. The hardware uses sophisticated algorithms to create high-quality vectors, which are variations from frame to frame and allow you to track the movement of objects.
It’s called frame interpolation and Mark Green uses it in a variety of applications. It consists of assigning an identification to each object in the image or to each pixel, depending on the level of precision, creating what is called an ID Buffer. It will also let you know where each object is in each frame and be able to predict its movement.
What it does is output the data as a chain of matrices or tensors. A data format optimized for processing on NVIDIA GPU Tensor Cores. Where each value corresponds to a pixel of the image and each matrix to a frame or a color sub-component thereof. This type of data structures are used in AI and, due to their nature, their execution requires specialized units.
What are your applications?
The first utility? The clearest is the creation of interpolated frames, which are intermediate frames that are placed between two existing frames.
Although the most famous, commercially speaking, is the creation of motion vectors that can be used for algorithms such as Temporal AntiAliasing, DLSS 2 and 3, FSR 2.0. Since to reconstruct the image, they use information from previous images for greater accuracy.
El Optical Flow Accelerator
Frame interpolation algorithms and creation of motion vectors can be done through algorithms running on the GPU itself. However, NVIDIA after years of knowledge in automatic driving and especially in computer vision knew how to apply this knowledge to games.
We understand computer vision as the ability to identify and delineate objects in an image. In other words, it is not a question of generating objects. Well, the Optical Flow Accelerator is a piece of hardware inside the RTX 40 that automatically observes objects on screen, identifies them, and calculates motion vectors from multiple previous frames and calculates trajectory.
This means that in DLSS 3 where it is used, the time-related part of the code has been completely removed. It’s also a trick from Nvidia to prevent competing algorithms, such as AMD’s FSR 2.0, from continuing to use DLSS 2.0 libraries for their own benefit. In return, it has a counterpart and that is that the games that use it can only run on the NVIDIA RTX 40 with the Ada architecture.
Visibility buffer
It’s one of the key elements of the latest versions of Unity and Unreal Engine, the two engines most used to create video games, and it’s what connects DLSS 3 to Ray Tracing, two elements that NVIDIA itself has related to each other. . Well, what the Optical Flow Accelerator does is generate it automatically without the involvement of any external element.
Frame interpolation in games
In a movie, because all of its frames exist and are recorded, it is easy to do frame interpolation. In a video game, each of them is unique, and a lot of computing power is needed to complete the entire identification process fast enough to be useful in real time and not affect gameplay.
However, it happens that between what the GPU generates in the VRAM of the graphics card and what is sent to our screen, there is usually a lag. What happens finally? Well, many times the first frame was not sent to the screen and the GPU managed to make the second one and this often happens in games that run at high framerates, i.e. they must resolve an image in milliseconds.
If the succession of images is very rapid, our brain does not pay attention to detail, so we can generate a series of intermediate ghost images. And NVIDIA has taken advantage of this with DLSS 3, where thanks to this capacity, they can generate new images. Notice that the game engine, i.e. the speed at which the CPU generates the display list for each frame, does not match the speed at which they are generated, since many of them have been generated automatically by interpolation.
Which games support DLSS 3?
There are currently 35 titles, although two of them do not belong to games but to graphics engines such as Unity and Unreal Engine. For the moment, the list of compatible games is as follows:
- A Plague Tale: Requiem
- atomic heart
- Dark Myth: Wukong
- Light Memory: Infinite
- Chernobylite
- Conqueror’s Blade
- Cyberpunk 2077
- Dakar Rally
- Deliver us Mars
- Destroy all humans! 2 – Reproved
- Dying Light 2 Stay Human
- F1 22
- IST: Forged in the Shadow Torch
- frostbite engine
- Hitman 3
- Hogwarts Legacy
- Icarus
- Jurassic World Evolution 2
- Justice
- Loopmancer
- Marauders
- Microsoft Flight Simulator
- Midnight Ghost Hunt
- Mount & Blade II: Bannerlord
- Hell: Bladepoint
- NVIDIA Omniverse
- NVIDIA Racer RTX
- Perish
- Portal with RTX
- Answer
- STALKER 2: Heart of Chernobyl
- ping
- Sword and fairy 7
- Synchronized
- The Lord of the Rings: Gollum
- The Witcher 3: Wild Hunt
- throne and freedom
- Tower of Fantasy
- Unity
- Unreal engine 4 and 5
- Warhammer 40,000: Darktide
We will add titles to the list of DLSS 3 compatible games as they are announced both by its developers and NVIDIA itself.
Table of Contents