Domain-specific accelerators: definition, architecture and use

Over time, programs have to go faster and faster, but at the same time, CPUs and GPUs have become complexity giants where it is very difficult to increase performance in the traditional way. The future solution to this problem? Domain-specific accelerators

First, the concept of the accelerator

Since the dawn of computing, carrier chips have been needed to speed up certain techniques, originally these chips freed the CPU from a repetitive and recursive task. The clearest example was the graphics systems which allowed the CPU not to waste most of the time drawing on the screen.

An accelerator is a support chip that goes a step further, because it not only frees the processor from this task, but also speeds it up. That is, the task is performed in part of the time that the processor would take. Which means it’s accelerated and has an impact on anything that goes faster. Hence the name accelerator.

Accelerators come in many types and designs, any kind of hardware can be an accelerator: microcontroller, FPGA, combinational or sequential circuit, etc. In recent years, a type of accelerator has emerged that will dominate hardware in subsequent years, domain-specific accelerators.

Domain-specific accelerators, general definition

In hardware, we have long used accelerators for different types of work and specific applications and therefore for a specific field in particular. Today, these fields can be graphics, deep learning, bioinformatics, voice processing and real-time images. There are many specific areas where a domain specific accelerator can solve the problem in a better way than a processor i.e. in less time and consuming less.

The first thing that comes to mind is the question: is a GPU a domain specific accelerator? No, this is not the case. DSAs take care of very specific tasks in particular, so a GPU is going to have more than one of these units. To make it more understandable, it must be taken into account that each task can be divided into several smaller ones which can be accelerated independently with this type of processors.

However, domain-specific accelerators are different from other options that exist on the market, as they exploit in their design a series of characteristics that place them between general-purpose processors and conventional accelerators. In other words, they do not reach the complexity of a CPU, but they are much more complex than traditional solutions, especially those based on a fixed function.

Specific domain, specific ISA

The first thing we need to keep in mind is that a domain specific accelerator is not a processor, although it also runs a program, its design is optimized for a specific solution and not a general one, for this a totally ISA is created around the exclusive DSA Unit whose instructions, registers and data types used are supposed to resolve in a short time certain instructions that a CPU would take many cycles to complete.

The processors of their ISAs today build instructions from microinstructions that share a common data path throughout the instruction cycle. This means that due to the complexity of the instruction set, a complex instruction takes many cycles to complete. In a DSA, we can create instruction loops and specific data paths for certain instructions that execute faster. We can even create units in parallel that only execute that instruction recursively.

But the biggest advantage of this is that it allows us for some applications to get rid of instructions than a general purpose unit which for our specific application is unnecessary. And that in recent years they have ended up transforming the registers and instruction sets of CPUs and GPUs into behemoths that take up a lot of space.

Domain-specific accelerators and access to memoria

Another improvement of the DSA relates to memory, because like a microcontroller, they use memory within the accelerator itself. This is important, since the physical distance in which the memory is located influences the energy cost of the instructions.

Its memory configuration is the main advantage of accelerators, since each executed instruction consumes much less energy than in a CPU, in addition it avoids the problem of contention with the memory. A DSA does not use system RAM to perform its calculations, so it can run in parallel at any time.

Also, because of how they work, we can put them in an SoC or similar structure and have the CPU communicate directly with them, all without having to go through RAM to get the data.

Hardware and software go hand in hand in DSAs

Hardware design is not normally done for specific software, rather it is the software that is tailored to take advantage of the hardware. This is done through the use of specialized APIs at the software level where the software interacts with a hardware abstraction so that a program that is the driver is the one that performs the translation between the abstraction and the hardware.

In domain-specific accelerators, the idea is that they can run a program that runs on them as if it were a processor, but given that they have a specialized set of instructions for a specific issue for programs to run faster under a processor due to its specialized architecture and ISA.

Most hardware designs in the future will be DSAs for specialized issues. Which will be created locally in each company and institution to accelerate specific parts of one or more programs that have been developed. Its implementation will go through the creation of unique chips, its implementation in SoCs and even in FPGAs using languages such as Verilog or VHDL.

So it’s about completely reversing the relationship between hardware and software, as we move from designing software to leverage specific hardware to designing hardware for specific software solutions.