If we look at the hardware designed for artificial intelligence that is on the market, what stands out the most right now are high performance graphics cards such as the NVIDIA Tesla whose current models use its A100 GPU, from from AMD we have the AMD Instinct with its CDNA architecture. To this must be added the integration of Intel’s AMX units in its future Xeon Sapphire Rapids CPU. What do all of them have in common? They use HBM memory, so we are faced with the fact that artificial intelligence as well as graphics rendering are applications that require high bandwidth.
Why does artificial intelligence require so much bandwidth?
Artificial intelligence algorithms, like other types of algorithms, use a series of input data, but in these algorithms a new type of data is added, namely weights. To understand this, it must be borne in mind that all AI processors are systolic systems with the aim of emulating the functioning of a biological neuron.
The neurons that exist in the brain are made up of dendrites, soma and axon. The former are responsible for capturing the nerve impulses emitted by other neurons. These impulses are processed in the soma and transmitted by the axon which emits a nerve impulse to neighboring neurons. This is done in a systolic system where the input data from the ALU or neighboring processor, depending on the complexity, is the output data from the previous ALU or processor.
This at first glance should make us understand that since there is internal communication between the different processing elements, then the required external memory will be much less. But arrays need a large amount of data to start working, because the data in the ALUs or processors at the ends must come from the memory external to the chip as well as store the resulting conclusion in memory.
How is bandwidth calculated for AI?
Well the answer is that it depends on the case because on the one hand we may be in front of an algorithm which has a lot of internal recursion and hardly accesses the external memory. It also depends on the type of neural network implemented, but generally the ideal in every processor is to achieve that at least one byte of bandwidth is transmitted per operation performed by the processor. A very difficult paradigm to obtain.
The point is that there are applications where the bandwidth is very low, but in others near the ideal bandwidth is required. Remember, AI handles a huge amount of data. This leads to huge bandwidths which cannot be achieved with conventional DDR memory.