What can we expect from the future NVIDIA RTX 50?

There are still nearly two years left for the RTX 50, so talking about performance numbers and such is nonsense, as even NVIDIA itself doesn’t know that at this point. However, by analyzing what the architecture of the RTX 40 looks like, the things they couldn’t add to the H100 chip, and some upcoming technologies, we can get an idea of how they can be from a point of view. logical and realistic view. of sight.

Ampere Next Next

Long before the Lovelace name sounded like a code name for the RTX 40, NVIDIA released a somewhat curious roadmap, where the RTX 30 appeared named “Ampere”, the RTX 40 appeared as “Ampere Next” and the RTX 50 as “Ampere”. Next Next’. Off the joke, it’s not that they lacked scientists to name the new architectures, but the release of the RTX 40 confirmed it to us.

What defines all GPUs are their cores, at NVIDIA these are called SMs, which are also made up of sub-cores. Well, there is more difference between RTX 20 and RTX 30, than from RTX 30 to RTX 40, where the only thing that has been done is to make four changes to the “Tensor Cores” and add a new RT Core with more capabilities. The rest? It remained the same and should do so compared to the RTX 40.

This, along with Samsung’s move from 8nm to TSMC’s 4nm, is what allows for an RTX 4090 with 144 SM cores on chip, albeit 128 active. Could the number of cores have been higher? Well yes, but we have to assume that the RTX 4090 uses the same amount of memory as the RTX 3090 Ti and therefore has less per core available.

What changes do we expect to see in the SM of the RTX 50?

Apart from what is specified in the following list, the rest of the items will continue as before.

Tensor cores
RT core
The scheduler or ‘Warp Scheduler’ which has not been renewed since RTX 30. It will do so to adopt some changes from CUDA9.
- This supposes changes in the first level caches: data and instructions. As well as in the local memory of each SM.

Rumors speak of a revival of SM after a long time. We don’t believe so and instead they confuse the implementation of concepts seen in the H100 chip that were not implemented in the AD10x chips of the various RTX 40s.

Will the RTX 50 have as much cache as the current ones?

No, we haven’t changed the subject or another article has been moved, but to understand what the NVIDIA RTX 50 may look like, we need to understand this change in the current generation, what caused it and if there is has signs to see to the next generation.

For starters shrinking the memory bus from what is needed so you don’t end up having a bigger chip because the cost per wafer is higher and less mm² is available compared to the previous generation and unfortunately the memory interfaces they do not fit with chip making nodes.
The second reason is that by cutting the bus we end up with an anemia in the bandwidth. The best? Increase L2 cache so they are less likely to access video RAM, if any, they are capped in this case, both on chip space and latency.
The third reason is that placing more cores increases memory demands. Although we have GDDR6X available at 23 and 24 Gbps. If NVIDIA hasn’t released them yet, it’s because they don’t dare to release a graphics card with a TDP of 600 W at the moment and it must be because of something, maybe because that they don’t. fully trust their refrigeration mechanisms for such high consumption.

So, can we know what the RTX 50 will look like?

At this point, only NVIDIA knows, but we can make a number of predictions about what changes we’ll see. Chief among them will be the use of GDDR7 memory and believe us that even relaunching an RTX 4090 with an interface for said memory we would see a huge performance boost in various sections that are now bandwidth limited .

Although the 96 MB of L2 cache that the maximum configuration of the AD102 chip has may seem impressive to us, in the end it is quite anemic for the enormous amount of data that a GPU of such a caliber like the RTX 4090 manages. , that’s why it’s more than possible in a possible high-end RTX 50, let’s not see the same amount of L2 cache on the chip, but much less in order to adopt more SM cores, even if what interests us instead is internal communication, which we know will change.

CUDA9

One of the biggest differences between the H100 and the AD102 is the fact that the latter is a CUDA 8.9 GPU, as it lacks a series of changes that NVIDIA has added to its GPU for high performance computing. And no, we are not talking about the SM cores which will always be different between the two markets, but rather how they communicate with each other.

The Tensor Memory Accelerator or TMA is a unit that allows transfer and direct communication between the local memories of SMs. Not to be confused with Cache and Global Shared Memory, which is inside the chip and is also not a cache.
Thread Block Cluster, which consists of several SMs sharing both their L1 caches and their local memories to work in groups.

These changes, although minor, we will see them in the SM of the RTX 50, yes or yes, from the moment these graphics cards are going to have to run said version of CUDA.

And what about the latest rumours?

There’s a lot of talk that we’ll see a 512-bit bus on the RTX 50, which we doubt as it would make the chip bigger and more expensive than it can already be. The justification ? Disaggregating the GPU like AMD did but we think that only makes sense if your design if it’s in a single chip is over the limit and for years they tried not to exceed it, but that’s not the problem anymore, but the cost associated with manufacturing such a large chip.

To this day, even NVIDIA does not dare to do it like before. What is the most logical scenario? Well, moving to a dual-chip configuration, where both have access to the same memory. This would lead us to the addition of an L3 cache and a separate memory controller, all in a configuration very similar to that of RDNA 3, but it would not be a copy since the company itself was already talking about it in its CUP document. Something that NVIDIA would have access to, since it’s a TSMC client. In all cases, the various elements must be above an interposer to reduce the energy cost of data transfer.

Conclusion for now

In short, things would remain, for the moment, as follows:

RTX 50 adding elements of the H100 which are useful for gaming and which were not added in the current generation.
They will seek to make it more efficient, to gain higher clock speeds.
The biggest boost will come from GDDR7, which will bring higher capacity and bandwidth, two things where the RTX 40 falls short.

Other than that, we’re not going to see a big change, moreover, it’s possible that the final number of cores won’t increase as much or even not and the performance will come from the things we told you about. The reason is that if AMD eventually decides to release a graphics card with a higher number of compute units, it’s very unlikely to exceed the number of cores the RTX 4090 has, so NVIDIA doesn’t have any no need.