iGamesNews
  • Home
  • Forum
  • PC
  • Sony
    • PlayStation
    • Ps5
  • Nintendo
    • Switch
  • Microsoft
    • Xbox One
  • Mobile
    • Android
    • Ios
  • Reviews
    • Guides
No Result
View All Result
  • Home
  • Forum
  • PC
  • Sony
    • PlayStation
    • Ps5
  • Nintendo
    • Switch
  • Microsoft
    • Xbox One
  • Mobile
    • Android
    • Ios
  • Reviews
    • Guides
No Result
View All Result
iGamesNews
No Result
View All Result

SWAR, how AI and multimedia accelerate both processors and GPUs

The Boss by The Boss
April 6, 2021
in PC
0

The performance of a processor can be measured in two ways, on the one hand, how fast it executes serial instructions and therefore they cannot be parallelized, since they only affect unit data. On the other hand, those which work with several data and can be parallelized. The traditional way of doing it on processors and GPUs? SIMD units, of which there is a subtype widely used in CPUs and GPUs, SWAR units.

ALU and their complexity

1 bit ALU

Before talking about the SWAR concept, we need to keep in mind that ALUs are the units of a CPU that are responsible for performing arithmetic and logic calculations with the different numbers. These can become complex in two ways, one because of the complexity of the instruction to be executed. The internal circuit of an ALU which can perform, for example, the calculation of a square root is not the same as that of a simple sum.

The other is the precision with which they work, which is the number of bits they simultaneously manipulate each time. An ALU can still handle data equal to or less than the number of bits for which it is designed. For example, we can’t make a 16-bit ALU calculate a 32-bit number, but we can do the opposite.

But what happens when we have multiple data of lesser precision? Normally they will run at the same speed as full precision, but there is a way to speed them up, and that is the over-register SIMD. Which is also a way to save transistors in a processor.

What is the SWAR concept?

SIMD climb

By now, many readers will know that this is a SIMD unit, but we’re going to take a look at it so that no one gets lost in this article from the start. A SIMD unit is a type of ALU where, through a single instruction, multiple data is manipulated at the same time, and therefore there are multiple ALUs that share the captive part of what the instruction itself is and what its decoding, but where in each a different information is processed.

SIMD units are usually made up of multiple ALUs, but there are cases where the ALUs are subdivided into simpler ones, as well as the accumulation register where they temporarily store their data to calculate it. It is called SIMD on a register or by its acronym in English SWAR, which means SIMD in a register or SIMD on a register.

This type of SIMD unit is widely used and allows an n-bit precision ALU to execute the same instruction but using data with less precision. Usually with a precision of a half or a quarter. For example, we can make one 64-bit ALU to act as two 32-bit ALUs by executing said instruction in parallel, or four 16-bit.

Learn more about the SWAR concept?

Escalar SIMD SWAR

This concept is already decades old, but the first time it appeared on PCs was in the late 90s with the appearance of SIMD units in the various types of processors that existed. The veterans of the place will remember concepts like MMX, AMD 3D Now !, SSE and others which were SIMD units built under the SWAR concept.

Suppose we want to build a 128 bit SIMD unit

  • In conventional SIMD units we have several ALUs operating in parallel and each of them has its own register or data accumulator. Thus, a 128-bit SIMD unit can be made up of 4 32-bit ALUs and 4 32-bit registers.
  • Instead, a SWAR unit is a single ALU that can operate with very high precision along with its accumulator register. This allows us to build the SIMD unit using a single 128 bit ALU with SWAR support.

The advantage of implementing a SWAR type unit over a scalar unit is simple to understand, if an ALU does not contain the SWAR mechanism that allows it to function as a SIMD unit with less precision data. , it will perform them at the same time. speed. that the data of the highest precision. What does it mean? A 32-bit unit without SWAR support, in case it needs to execute the same instruction on 16-bit data, will do so at the same speed as a 32-bit unit. On the other hand, if the ALU supports SWAR, it will be able to execute two 16-bit instructions in the same cycle, in the case where the two follow one another.

SWAR as a patch for the AI

AI brain node

Artificial intelligence algorithms have a peculiarity, they tend to work with very low precision data and today most ALUs work with 32 bit precision. This means adding 16, 8, and even 4-bit precision ALUs to a processor to speed up these algorithms. Which complicates the processor, but the engineers didn’t fall into this error and started pulling the SIMD to the registry in a peculiar way, especially on GPUs.

Is it possible to combine a conventional ALU SIMD with a SWAR design? Well yes, and this is what, for example, AMD does in its GPUs where each of the 32-bit ALUs that make up the SIMD units of its RDNA GPUs supports register-based SIMD and can therefore be subdivided into two 16-bit, 4 of 8 bits or 8 of 4 bits.

In the case of NVIDIA, they put the onus of speeding up algorithms for AI to the Tensor Cores, which are systolic arrays made up of 16-bit floating-point ALUs interconnected with each other in a three-way matrix. axes, hence the name of the unit. Tensor. They are not SIMD units, but each of their ALUs supports register SIMD by being able to perform twice as many operations with 8-bit precision and four times with 4-bit precision. Either way, Tensor units are important because they are designed to speed up die-to-die operations to a much higher speed than with a SIMD unit.

Table of Contents

  • ALU and their complexity
  • What is the SWAR concept?
  • Learn more about the SWAR concept?
  • SWAR as a patch for the AI
Tags: AccelerateGPUsmultimediaprocessorsSWAR
Previous Post

wireless charging and noise cancellation for less than 46 $

Next Post

Picture of the day: The legendary Windows XP background image, carefully revised

The Boss

The Boss

Gamer, passionate about video games, technologies, gadgets and everything related to the world of electronics.

Related Posts

PC

Huawei Watch GT 3 Pro: Meet Huawei’s new smartwatch

May 19, 2022
PC

NVIDIA releases new security drivers for Windows 7

May 19, 2022
PC

The Huawei Mate Xs 2 arrives in Europe: Premiere, price and where to buy

May 18, 2022
PC

CORSAIR Vengeance DDR5 RAM review

May 18, 2022
PC

They present the Acer Swift 3: Availability, price and benefits

May 18, 2022
PC

What are El Xokas headphones and why do you use them?

May 18, 2022
Next Post

Picture of the day: The legendary Windows XP background image, carefully revised

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

  • The Wholesome Direct and Guerrilla Collective Showcase will take place on June 11th
  • iOS 15.6 beta 1 now available for download a few days after experiencing iOS 16
  • Survival game becomes a top seller on Steam overnight
  • F1 22 will hit 4K/60fps on Xbox Series and PS5
  • the new OnePlus game edition that reduces the price by improving the camera and the battery
  • Home
  • Contact Us
  • Privacy Policy
  • Contact Us
  • Cookies
  • Privacy Policy

© 2021 IgamesNews - The Best Video Game Website in English.

No Result
View All Result
  • Home
  • Forum
  • PC
  • Sony
    • PlayStation
    • Ps5
  • Nintendo
    • Switch
  • Microsoft
    • Xbox One
  • Mobile
    • Android
    • Ios
  • Reviews
    • Guides

© 2021 IgamesNews - The Best Video Game Website in English.