It has been several months (April) since we last discussed the calls COPA-GPU NVIDIA’s Exascale, which is nothing more than the name Huang gave the call Composable on package, a 2.5DIC or 3D integrated circuit (depending on the specific design, nothing new to add) that would come with the Hooper architecture.
With this in mind, what could be the new GH100 chip based on this new architecture dedicated to DL and AI has leaked, in particular the first, where as we already know, low precision matrix mathematical performance prevails, or in in other words, FP16. This chip was specified with GPU-N and the view certainly does not disappoint if one does not forget its end.
Two designs, two goals, one architecture
Just to come back to what we saw at the start of the year, what Nvidia intends to launch two different architectures focused on each objective:
- Games -> Ada lovelace
- DL and IA -> Hooper (MCM and monody)
But within the latter, there will be two different GPU designs that will be COPA and that differ in their capabilities and approaches. This supposed GPU-N would be DL-centric and therefore would be the natural substitute for the current NVIDIA A100, which is important for understanding the data that we are going to see now, since it is not an MCM GPU. as such, but one of the modules exclusively, so it is understood that there will be multi die and single die graphics cards.
Why is GPU-N behind AMD?
The mysterious GPU-N
Note: All performance tests are run in NVArchSim, simulating GPU-N
High precision for MLPerf workloadsComparison with other DL ASICs
(7 / x) pic.twitter.com/mxJExyILbM
– Redfire (@ Redfire75369) December 14, 2021
This graphics card is mono die, so from the start it will be slower than the MI250X, but also its objective is not the AI as such, where for that if we would find MCM designs, it is different, it is one more specific to the market.
GPU-N will have 134 SM, which if NVIDIA Shaders’ current structure with Ampere was maintained, would give us nothing less than 8,576 hearts
This is due to the 6,144-bit bandwidth, which could mean sizes more than twice the memory capacity. The problem with all of this is that the comparison with the MI250X It’s not really fair even if they share some similarities, since this GPU-N would achieve 24.2 TFLOP on FP32, only 24% more than the GPU it replaces.
Instead, it achieves 2.5 times the performance of the FP16 with 779 TFLOP And here is the important thing for the sector where it is oriented. AMD Instinct MI250X achieved 95.7 TFLOP in FP32 and 383 TFLOP and FP16 (-2.15x).
A high-end chip
GPU-N and COPA-GPU configurations
SM: 134
Frequency: 1.4 GHz
TFLOP: 24.2 FP32, 779 FP16
L2: 60 MiB (up to 1,920 MiB)
DRAM bandwidth: 2.687 TB / s (at 6.3 TB / s)
DRAM capacity: 100 GiB (to 233 GiB)(8 / x) pic.twitter.com/v9ZJzzUBZl
– Redfire (@ Redfire75369) December 14, 2021
These data show that we are not facing the fastest GPU for NVIDIA DL, because the full matrix configuration should have 144 SM and not 134. We know from the leaks of the supposed H100 that it is responsible for competing with AMD’s MI250X based on 288 SM with interposer and 18.432 Shaders.
So there we will see the potential of the architecture, but for the moment the data is promising for this sector and is not far from the improvement of 3x which we commented on based on the leaks. Until it is presented as we well know, any resemblance to reality is pure coincidence, so let’s take this data with a little salt and be patient as it could be presented in the CES2022.