HBM memory which until now has been linked to GPUs for the HPC market will begin to be adopted by CPUs. What does it bring and why will HBM memory be used for processors and what are Intel and AMD’s reasons for implementing it?
A review of HBM memory
HBM memory is a type of memory made up of several stacked memory chips, which communicate vertically with your controller using paths through silicon. Such a three-dimensional integrated circuit is packaged together and sold as an HBM chip.
In order to communicate with the processor, the HBM chip does not use a serial interface, but rather communicates with the substrate or interposer below to transmit the data. This allows you to communicate with the processor using a greater number of pins and reduce the clock speed for each of them. The result? A RAM memory which, compared to another type, consumes much less during data transmission.
HBM memory has not been used in the home market because it is expensive, its composition of several memory chips makes it very difficult to manufacture for large-scale products, but it is ideal for smaller-scale products. Whether it is GPUs for high performance computing and even CPUs for servers where HBM is going to make its appearance,
Memory channels are part of the key
One of the differences of HBM memory compared to other types of memory is the support for up to 8 memory channels, which is the configuration typically used in servers. Thus, the 8 channels of DDR4 memory are replaced by the 8 channels of HBM memory, which have much higher bandwidth and lower latency.
Lower latency? Well yes, and all of this being on the same substrate, it means that the memory accesses during the execution of the instructions last for less time and therefore that there is less latency. Disability? HBM2 memory has a much smaller storage capacity and therefore it is necessary to add memory at a lower level of the hierarchy.
Since it has lower latency than DDR4, it is possible to place HBM as memory for the CPU above DDR4 memory in the memory hierarchy, and the data is dumped from DDR4 to HBM memory at need and even use NVMe SSD memory below with a fairly fast PCI Express interface. Of course, with the use of real-time data compression and decompression systems from SSD to RAM.
Why does a server processor need HBM memory?
HBM memory stands out above all for its huge bandwidth, which seems completely over the top for a CPU, but it should be borne in mind that one of the points where NVIDIA has a large presence in the server market is the artificial intelligence, thanks to the addition of tensor units in their GPUs since the launch of Volta, so that the vast majority of systems used for training and inference for artificial intelligence include such units.
What is the situation of Intel and AMD? The answer is to add this type of units in their servers, in the case of Intel it is the AMX units and in AMD at the moment it is not known which unit they will implement. But the purpose of adding these drives is to reduce NVIDIA hardware for the servers. For this we have to keep in mind that AMD has a graphics division that rivals NVIDIA, but neither can we forget about Intel Xe HPC and Intel Xe HP by Intel.
Drives for AI require a lot of bandwidth to function, in that they are perfectly in line with GPUs, which is why NVIDIA sells its GPUs under the name PU “AI” or vice versa. At the same time, this is the reason why HBM memory usage will be added to CPUs, in order to convert them to units which double for both AI and CPUs.
An evolution that happened before
In the 1990s, DSP units were used to accelerate emerging multimedia applications. Where are these units now? They disappeared as soon as the SIMD units were implemented in the CPUs and the implementation of a DSP unit for the acceleration of multimedia algorithms was no longer necessary.
In the case of AI, the concept is the same, the idea of implementing tensor units in server CPUs what she seeks is to be able to do without GPUs for these tasks. So from a business standpoint for Intel and AMD, the message can be summed up as “Don’t buy GPUs with units for AI when you can already do it on the CPU.”
NVIDIA GPUs owe their power to AI thanks to the large number of shader units or SMs, for example the GA102 has 82 SM in its configuration, which is far more than the number of cores in a desktop processor. In a server processor we are talking about dozens of cores, for example the AMD EPYC can reach up to 64 cores and will grow in the next generations. With this, we can better understand how the adoption of HBM memory in server processors will be, especially in the face of a market with increasingly AI-centric applications.
Table of Contents