VISC, multi-threaded CPU architecture with virtual cores

To understand how the VISC paradigm works, we need to consider two different concepts regarding processor performance. The first of these is the fact that PC processors today have an even smaller internal instruction set than RISC, because what they do is transfer each of the instructions into smaller microinstructions. internally during the decoding phase. If we are purists, the conclusion today in all processors is not that they are RISC, but that they have a very small instruction set that works internally and is used to build the rest of the instructions. That is to say that as soon as an instruction reaches the control unit CPU, it is decomposed into a list of instructions.

The war between RISC and CISC was therefore won by the former, but with the trap that x86, the most widely used CISC architecture, made the trap of behaving internally like a RISC. To this day, with the exception of ARM, the rest of the ISA RISC is either missing or on the verge of extinction. Additionally, even ARM has embraced the concept of splitting instructions into simpler instructions, so that the two paradigms outside of defining a family’s common ISA are already extinct.

Amdahl’s law

To understand a program, we need to understand that a program has two different parts:

The one that can only be executed serially and, therefore, can only be solved by a single kernel by running a solo execution kernel.

The part of the code that can be executed in parallel, which means that it can be solved by multiple cores at the same time and the more there is in the processor, the faster this part will be solved.

If we take into account what was explained in the previous section, you will conclude that some of the processor instructions that become microcode become a succession of instructions that can run in series or in parallel between several cores, although the usual or that most of the instructions are executed in a single kernel and that it is through shared elements that the code is executed in parallel.

Therefore, whether part of the code is executed by multiple cores depends exclusively on the developer of the program, who must explicitly program it for some parts to work in parallel.

VISC and virtual cores

Once we have already explained all of the above, we can explain what the acronym VISC means, the definition of which is the direct answer to the following question: when generating the microinstructions in the decoding step, can’t they work? in parallel with several cores instead of doing it in one ditto?

Well, the answer to the question about VISC architecture, which was first raised by a company called Soft Machines in 2015 as a concept for improving processor performance. This small startup was bought by Intel in 2016 and since then they have been working on the development of a VISC architecture. How it works? Well, this can be defined very easily: a single thread of execution is sent to the Global Front End of the processor, which is converted into several which perform the same function and which work in parallel and run in virtual cores. The conversion process is performed at the software level through a translation layer, but we have to keep in mind that it can be something as simple as a microcontroller performing instruction transfer.

Contrary to what happens in the distribution of tasks in a conventional multicore processor, in a VISC architecture it is not expected that a core is free to be able to execute an instruction, but that the elements to execute it are available to the user. within the processor to run this. For example, it may happen that in a conventional kernel the vector unit is not used, but under this paradigm it can be used to form one of the instructions.

VISC and performance

When adopting a new paradigm in terms of architecture, the first thing to consider is its impact on performance, as it is not worth changing the current paradigm if it does not lead to an increase in performance. overall processor. The most classic way to increase the performance of a processor is to increase the number of instructions that are resolved per clock cycle, this means that each time the hardware becomes more and more complex, due to the fact that than the addition of the hearts that we have to count all the infrastructures that surround them which become the same or more complex.

What differentiates VISC from the rest is the distribution of processor resources so that the execution of the various instructions is carried out in a few clock cycles, between 1 and 4 cores. This way, if there are two instructions competing for other resources in a core, then they can be reassigned very quickly to another part of the processor where those same resources are available.

The current paradigm, which is out-of-order execution, is to reorder the execution of instructions based on free resources at any time, and then reorder the output of data that has already been processed. The limit ? Resource allocation is done at the single-core level rather than the multi-core level, and this is the key to better performance for VISC architectures.

Do these processors exist today?

While the concept looks great on paper, no one has yet introduced a processor that works under this paradigm, but as we gradually approach the limits of the current paradigm, it’s important to keep in mind that it There are solutions that can be used to improve the CPU performance of our PCs for the future.

Having a more powerful processor is not only having one faster or with more cores, but it is based on knowing how to take advantage of the available resources. Running out of order was the first step in that direction, but since then, outside of multicore, the changes have generally been minor. VISC is still a concept, but it is not impossible and it is a way to take advantage of the resources available in the processor in a much more efficient way.

So far we know that the concept is possible in a processor since Soft Machines designed and built one with this paradigm, so although this is on an experimental level, we do know that it is possible to achieve such a design. . Another thing different is the difficulty of bringing the set of x86 instructions and registers to this paradigm, which is extremely complex in nature.