In every computer, two things are mainly done: moving data and processing it. Until now, the importance of managing the movement of information across hardware has not been addressed, and performance improvements have been based on making more powerful processors. This trend is starting to change, and as a result, information logistics will become more important in the future.
Accelerators to move data?
Normally, when a CPU or GPU makes a memory request, the entire mechanism in charge of capturing the data is usually passive and always works the same. This is sufficient if we are talking about sufficiently low volumes of information.
The problem stems from the way memory requests work and it is that each delayed memory request ends up delaying the rest and ends up creating a common latency which increases, affecting not only the performance of the processor but also that of its own memory. This is where a series of units originate from the world of supercomputers, the so-called SmartNICs based on the communication of their large networks, but this time implemented as a Northbridge.
With them, algorithms for data logistics can be added, optimization will no longer come only from the main code, but also from the way data access is handled at all times. This will mean the entry of a new processing unit.
Interconnection will have to change
The key to implementing SmartNICs to control Northbridges will be that they will stop having the architecture of a conventional SoC and implement that of a NoC, in which every element of the processor communicates with the SmartNIC as if it It was a network and it acts as a central router for the various components.
The move to NoC or Network on a Chip is something that all processors, GPUs and SoCs from Intel, AMD and NVIDIA will suffer in the future. Not only because of the performance advantages of being able to handle data logistics, but because it is the best way to implement chip-based MCM systems.
In the case of AMD and Intel, we have the option of implementing their SmartNICs in the form of eFPGA with Xilinx and Altera technologies respectively. In the case of NVIDIA, we have the Mellanox DPUs, which we know they will implement in future designs of their GPUs.
Will not affect existing code
So far, the logistics of data is done automatically and it’s counterproductive for developers to have to think about how it’s going to be handled. In reality, setting up accelerators to move data is not going to change compatibility with existing code and they will only act actively in data logistics if a specific request is made to do so.
Therefore, the compatibility will not be affected at all and it is only in some specific cases that accelerators will be used to move data. For ordinary people, a change like this cannot be quantified as it is usually spoken in terms of power and in this case we are talking about optimizing memory accesses of programs to get the most out of memory.