In early personal computers, RAM was much faster than CPUs, a well-known case is that of MOS 6502 in which memory accesses were shared with their video system, but over time. reversed to the point where the main system memory has become a performance bottleneck, forcing the implementation of cache memory to alleviate this problem.
If we look at any graph of how RAM and processors are changing in terms of performance, we will see that the distance between processors and memory has grown over time and is increasing. The questions are: is there an explanation for this phenomenon and a solution?
The reason for the bottleneck between CPU and RAM
The pyramid of the memory hierarchy is clear, the memory elements closest to the execution units give better performance than the distant ones, in other words, if a data is in one of these first levels of the hierarchy then it will be resolved. in fewer clock cycles.
On the other hand, the further away you get from it, the more performance drops due to the greater distance the wiring has to travel. It is at this point that we notice that the RAM is not inside the processor, which is the main reason for the bottleneck between the CPU and the RAM.
The distance between the CPU and the RAM may seem small to us, but considering the processor it is very high, so the only way to compensate would be to make the memory go at much higher speeds than the processor. It should be taken into account that when the CPU makes a request for memory access, a window of opportunity for data transfer opens. The key would be to reduce the communication time between the two elements. What on paper looks easy, however, is not at all.
Increasing the number of pins is a bad economical solution, as it does not reduce the wiring distance between the two parts and would increase the size of the processor and memory. In addition, the only advantage would be to be able to reduce consumption for the same bandwidth, but not latencies.
Why can’t the RAM speed be increased?
Transmitting information over long distances is a problem as the wiring adds resistance as the distance increases. It is for this reason that the transmission of information from the caches has a high RAM consumption. To count what the data transfer consumes, we use Joules per bit, or Joules in English, and since these per second are equivalent to watts, we can know the energy consumed.
Now increasing the clock speed of RAM is the other problem, in each semiconductor with a clock signal, the consumption is the clock speed multiplied by the capacitance and the voltage squared. Capacitance is a constant that depends on the manufacturing node, instead voltage increases linearly with clock speed (until it hits voltage walls). The end result is that increasing the clock speed would increase RAM power consumption to stratospheric levels, causing the temperature to rise to the stratosphere and memory shutdown.
Recently, new methods such as 3DIC have emerged which allow RAM to be interconnected tightly to the processor, in these cases the low latency is due to proximity and there is no need to increase the clock speed. In addition, they rely on a large number of interconnects to maintain a low clock cycle rate, so that the interface does not overheat and cause thermal suffocation with the processor. This removes the bottlenecks in this regard, but creates different bottlenecks, such as capacity or cost, which will be mitigated in the future.