NVIDIA Lovelace vs AMD RDNA 3, which has not been said about their GPUs

The Boss

PC

NVIDIA Lovelace vs AMD RDNA 3, which has not been said about their GPUs

AMD, GPUs, Lovelace, NVIDIA, RDNA

With the RTX 30 and RX 6000 sold out everywhere, it’s surprising that we’re talking about the next generation of graphics cards. Well, it really isn’t and officially neither AMD nor NVIDIA has said anything about their architectures. But, there is a series of rumors that logically ordered and with knowledge of graphical hardware architectures paint a very interesting scenario for the future.

A battlefield with a common knot

Platelet Manufacturing Processors

The two architectures will compete using TSMC’s 5nm node, which in AMD’s case will also be used to manufacture its Zen 4 APUs and CPUs. As for NVIDIA, it is a return to the Taiwanese foundry thereafter that the RTX 30s. were in preparation at Samsung.

TSMC, realizing this, will charge the highest bidder for its wafers, which will mean higher prices and space in TSMC factories must be reserved. The real situation? AMD doesn’t seem willing to continue being the cheap brand compared to NVIDIA. Which is part of the reason why AMD couldn’t get as big a market share as NVIDIA. Not only because of the worse performance, but also because the less money you make, the less wafers to pay and fewer wafers, fewer graphics cards end up being produced.

The high prices compared to the past in the case of AMD that we are already seeing with the RX 6000 and this is a trend in which Lisa Su’s company will continue to escalate prices until it matches the level from NVIDIA. After all, the market has accepted the high prices of the latest generations of NVIDIA cards, and AMD is still a multinational with a desire to make money.

What does NVIDIA have in store for Lovelace?

Ada lovelace

Lovelace is now a huge unknown, the only data we know of isn’t from NVIDIA but from Kopite7Kimi, an insider who got the specs for the current RTX 30 more than a year in advance. The 18,432 CUDA or ALU cores in FP32 that the new NVIDIA graphics architecture supposedly would have, a huge increase that almost increases the number of CUDA cores in the more powerful GA102.

In the RTX 30, we saw how the average number of ALUs in FP32 per shader unit or SM went from 64 to 128, a figure that translates to 144 SM in total, a figure even higher than that of an NVIDIA A100 and results in the most impressive leap, if true, in any NVIDIA generation. Such a leap makes us partially skeptical of such information.

It is quite possible that the 18,432 CUDA cores is a number that corresponds to NVIDIA Hopper and not Lovelace. After all, Hopper will be to Lovelace what Volta was to Turing. The reason we believe this is due to the rumors of the new organization going around for Hopper that will most likely end up being used in Lovelace.

A new organization for Hopper and Lovelace

New NVIDIA GPC

There is another rumor that speaks of change in the organization of your GPUs in the next generation of NVIDIA, where the minimum unit will be the SM and the sub-cores will be gone, so the SM unit will have a general scheduler instead of having one in each sub-core, in this aspect it will look a lot like more to AMD’s architecture where the lowest level cache is shared for all SMs equally.

The next point is the TPC, this one does not undergo any modifications, except that this time it will regroup 3 SM instead of 2 SM, but it is in the appearance of the Cluster processor core where does the interesting thing come from. Each CPC will have 3 TPCs inside and so we are talking about 18 SMs per GPC or 6 per CPC. The particularity of CPCs? Apparently everyone is assigned a new L1 cache of data and instructions. The amount of CPC per GPC? Well, according to rumors, there are three in total, but the amount of CPC per GPC could be variable because the number of TPCs is currently, but we do not know this last detail.

You might confuse Lovelace with Hopper

Poisible Diagrama Lovelace GL102

What makes us doubt the number of SM disks compared to this supposed new organization is that it takes 8 GPC to achieve this and the number of GPCs in NVIDIA GPUs has almost always been in line with bandwidth. memory. . Where normally 6 GPC equals a 384-bit bus and a larger bus is generally not used in commercial GPUs.

There is a possibility that Lovelace was mistaken for Hopper regarding its possible configuration. Once we see more possible and realistic. And for the record, this is speculation on our part based on things we have heard. And it is that NVIDIA Lovelace in its most powerful GPU could have a configuration of 6 GPC next generation, which would allow it to reach 108 SM. A figure which, even if it would be lower than the rumored 144 SM, is a big jump compared to the 82 SM of the current RTX 3090.

The difference between RDNA 3 and Lovelace is that the AMD GPU will be able to reach 160 Shader Units with a dual chip, to think that NVIDIA will reach 144 with a single chip is quite unrealistic to say the least and more when talking about the same node. So, we think the 144 Shader Unit setup might be Hopper and not Lovelace, since Lovelace is said to be a multi-chip GPU.

Infinity Cache not only in RDNA 3, but also in Lovelace

Infinite cache consumption

One of the points that makes us doubt the large number of SM drives we are talking about is the possibility that NVIDIA will copy AMD’s idea from Infinity Cache to AMD, which serves to prevent data deleted from L2 from being recovered. in RAM. , the reason is that the energy consumption is all the higher the further the memory is from any processor, as we have already mentioned several times.

The big news from NVIDIA for Lovelace would therefore be by adding an additional level of cache, which would do the same function as the Infinity Cache and become a common point between RDNA 3 and Lovelace. The advice on adding a large L3 cache comes from a recent NVIDIA article and it would make perfect sense as it is a way of not requiring large external bandwidths. We have the case of the AMD RX 6000 where its most powerful models use a 256-bit bus.

The addition of L3 cache in Lovelace and its increase in RDNA 3 is what makes us think that the 144 SM configuration for Lovelace might be an exaggeration due to a misinterpretation of information that NVIDIA has dropped, but, we repeat. , we could be wrong. In any case, we must not forget that one of the strengths of the TSMC 5nm node

Why does NVIDIA want an L3 cache in Lovelace?

But what meaning does it make for NVIDIA to do the same as AMD? Here is one of the keys to performance, the current NVIDIA GPUs from Maxwell currently use what is called Tiled Caching, which consists of directly rasterizing the scene in the L2 cache. It is therefore very similar to Tile Rendering, but with two very important differences:

  • Tile rendering processes the tiles in built-in, hardware-controlled memory, so the data does not randomly disappear until the tile is finished.
  • Tile rendering orders the geometry of the scene and generates a new list of screens for each tile in the scene before rasterizing the triangles. Tile caching does not.

In other words, Tiled Caching is a hybrid which in the first half of the pipeline works with a conventional GPU and the other half as a Tile Renderer but limited by relying on a cache so the data often goes down which means in many systems that go to DRAM. The solution? You add a very large cover to act as a mattress. Everything continues to work on the L2 cache, but this L3 cache is there to ensure that we can retrieve data faster and without power consumption across clouds.

What will AMD copy Lovelace to RDNA 3

AMD RDNA 3 GPU

As for RDNA 3, we’ll see how they take various ideas from the current NVIDIA RTX 30.

  • The report of The ALUs in FP32 per shader unit will be doubled, from 64 to 128, tying AMD with NVIDIA on this figure.
  • Nueva Ray throttle unit, which can traverse the BVH tree without depending on the shader unit, it will be a huge improvement in ray tracing performance.
  • The Base unit of the matrix CDNAs will be integrated into RDNA 3, this will enable algorithms based on convolutional neural networks similar to NVIDIA’s DLSS like the one Microsoft is developing.

On the other hand, there are going to be some internal changes, for example the Wave64 mode which gave compatibility with GCN and was the key to the backwards compatibility of the Xbox Series X and S, as well as the PlayStation 5 as it will disappear in RDNA 3, it will definitely say goodbye.

Leave a Comment