It is very clear that a processor cannot do the job on its own, there are some common operations that a computer performs all the time in which a general purpose processor is not efficient, but first of all, we need to understand what it means and why they are needed. support chips.
When designing a new architecture, there are a series of parameters that mark the limits that engineers should not exceed, including the type of libraries used for the design, the consumption of the chip, its size, size, but above all. what common problems it seeks to resolve with the new processor. It is at this point that not only the main units are defined, but also the coprocessors and accelerators that will be part of them.
The first support processors that are placed in an architecture are easy to elucidate, usually those that were designed for previous architectures of the same brand or, failing that, those that have been licensed to third parties. The latter, on the other hand, arise during development, as a result of customer requests or due to the type of problem to be solved that requires a new type of hardware unit.
What is a coprocessor?
Although the signifier is self-explanatory, it is important to keep in mind that if we have several cores working together to solve the same problem common to the distributed parts, then we are talking about each of the process units acting in co-processing with the distributed parts. others. . And yes, we know what went through your mind, but when we have multiple processor cores tackling a specific problem, we are talking about the ones not running the main process acting as the coprocessors of the others.
Support chips are traditionally referred to as coprocessors, although the most famous coprocessor in PC history is the math coprocessor, which was nothing more than what would later become the floating point unit or fully decoupled FPU. of the main CPU. Thus, the coprocessor usually lacks an instruction capture process in memory, but needs another processor to send the instructions and the data to be processed. The work of the coprocessor? Solve this part of the program and return the result as quickly as possible to the host processor.
While the coprocessor is tasked with doing its job, the main kernel can use the power it has gained to perform other tasks, but since a process is executed together, the point will be reached where it cannot continue. until the coprocessor (s) have completed their assigned task.
What is an accelerator?
Technically, an accelerator is a coprocessor, but with greater independence than these since they are not in charge of running a process as a whole, but rather they are assigned an entire process that the CPU ignores completely except to get the final result or to know that the task is completed.
Because an accelerator is completely decoupled from the processor, it is completely asynchronous. What do we mean? The fact that an accelerator, as opposed to a coprocessor, does not work in combination with the main processor of the system. This allows you to speed up your part of the code, i.e. complete it at a much higher speed and therefore in less time. Of course, this requires major changes in the architecture.
First, a coprocessor can share parts of the control unit and even registers or access common memory with the CPU. When all of these items are shared, they can create congestion in their access, causing one unit or another to shut down while waiting to use these resources. As you will understand, this cannot happen in an accelerator, so its data and instructions, although provided by the processor, are designed to be available to you 100% of the time, which is why many accelerators are full processors that have their own local RAM inside.
If an accelerator is better, why do you use a coprocessor?
As we said during the introduction of this article, everything has to do with the budget architects have to implement the solution to a problem and one thing that is usually not taken into account is the infrastructure of the building. communication between the different elements, as well as the units which are part of the instruction cycle of each processor, but which are not responsible for calculating numbers at high speed.
At the marketing level, it is very easy to sell the power of a processor in numbers, these are easily understood by people who can make an ordinal or cardinal comparison from said data. The reality is that today the infrastructure of any processor is what takes up the most space and that is why the decision to implement something in the form of a coprocessor or an accelerator is simply made. due to these limitations.
An example is NVIDIA’s Tensor Cores and NVDLA unit, both serving the same purpose, but while the former is a coprocessor within the shader unit that shares the registers and control unit with the rest of the GPU shader unit, in the case of the second is a processor itself. Not surprisingly, the acronym DLA stands for Deep Learning Accelerator.