The arrival of CPUs and APUs in PCs that use heterogeneous cores and therefore cores of different complexity and size is a fact. But how do these heterogeneous cores differ in nature and performance? This is the question that many ask themselves when reading the different architectures that appear on the market. Why, after more than a decade, the use of a single type of core has given way to the use of large and small cores in processors.
Why use different types of cores?
There are several reasons for this, the best known is the one that was used in the now classic SMALL CPU for smartphones, where two collections of cores of different power and consumption are switched depending on the type of applications depending on the type of application. to the workload on the smartphone at any time. This was done to increase the battery life of these devices.
Today this concept has evolved and it is already possible to use the two types of cores simultaneously and not in a switched manner. Thus, the combined design is no longer based on energy saving, but on obtaining the highest possible performance. This is where we come into two different ways of understanding performance depending on the use of heterogeneous cores.
The most common of these, because the easiest to implement, is to assign the lightest threads in terms of workload to the least powerful cores, a task that must be performed by the operating system. It is the software responsible for managing the use of hardware resources including the GPU. This way of working is the same as the Intel Lakefield and its future architectures like Alder Lake, as well as the ARM cores with DynamiQ.
Regardless, the organization relies on the use of two cores with the same set of registers and instructions but with different specifications. What are the differences between the different heterogeneous nuclei? Let’s see.
Big hearts against small hearts today
First of all let’s get into the obvious, the first difference between the two types of cores is the size. Since large cores are more complex than small cores, they have a more complex structure and therefore consist of a greater number of transistors. Ergo are larger than the smaller cores which have a much simpler structure. This means that in the chip space, we can include more small cores in the chip space than large cores.
To all of this, the first thing you will ask yourself is: what is the performance benefit when applying both types of cores? It should be borne in mind that on the PC today, on our PCs, several applications are running at the same time, each one executing several execution threads. What it is like to add more cores, even if it relies on doing it with lighter power cores, ends up adding to the overall performance.
In reality, smaller cores are just one more way to lighten the workload of larger, more complex cores, removing work to be done. Not only that, but even additional cores can be used to handle the most common interruptions of different devices, so that the rest of the cores do not have to stop working at any time to constantly support them at all times. . .
The architectures of the future go through heterogeneous configurations
The other more complex method to implement for large and small kernels differs from the previous one, since it consists of dividing the set of ISA registers and instructions and repeating it into two classes of kernels. The reason is that not all instructions have the same power consumption, but the simplest will always consume more in more complex cores. The idea is therefore not to distribute the execution threads to their corresponding kernel, but rather that the execution of a single execution thread is shared between two or more cores in an interlaced manner.
Consequently, its implementation is much more complex than the current model, since the different cores in charge of the same execution thread must have the necessary hardware to coordinate during the execution of the program code. The advantage of this paradigm is that in principle it does not require the work of the operating system to manage the different threads that the CPU must execute. But, in this case, as we have already commented, the division of heterogeneous kernel types depends on how the set of instructions is distributed between the two kernels.
How this method works is related to what is called Amdahl’s law and how programs evolve in terms of performance. On the one hand we have sequential parts which cannot be distributed over several cores because they cannot be executed in parallel and on the other hand parts which can. In the first case the power will not depend on the number of cores but on the power of each heart, while in the second it will depend on each heart.
Traditionally, the more complex instructions of a processor are implemented from a sequence of simpler instructions in order to take better advantage of the hardware. But the new fabrication nodes will allow more complex instructions to be directly wired into more complex cores, rather than being a composite of multiple cores. This will also serve to increase the general performance of the programs, because when executing these instructions, they will require much less clock cycles to execute.