The first ALU to come out was not part of a CPU, but rather a 7400 series chip with a TTL interface from Texas Instruments, the 74181 was the first ALU integrated on a single chip. It was only 4-bit and was used in various minicomputers during the 1960s, marking the first major transition in computing.
The construction of the first complete CPUs in the 1970s and with all the corresponding elements to perform a full instruction cycle obviously had to rely on the integration of ALUs to calculate the logic and arithmetic instructions inside the chip.
Types of ALU
We can divide the ALUs into two different subdivisions, the first is by type of number to be calculated and therefore if it is used with integers or with floating point, where in the latter case we speak of working with decimals. Floating-point operations follow a rule that indicates how many bits of the number correspond to the integer part and how many to the fractional part.
The standards in both cases also indicate whether the first number marks the sign or not, for example a number in 8-bit integers can represent a digit from 0 to 255 or from -127 to 127 depending on the format used.
The second categorization refers to the number of data and instructions that an ALU executes at the same time. The simplest form being the scalar ALU where an operation or an instruction is executed by operand. We also have the SIMD or vector units, which execute the same instruction with different operands at the same time.
Types of operations with an ALU
First of all, we need to have an ALU that cannot run on its own, so a control unit will be needed to indicate which instruction to execute and on what data to do it. So in this explanation we are going to assume that we have a control unit accompanying our ALU.
An ALU that allows any type of processor, be it a CPU or a GPU, to perform mathematical operations with binary numbers. It is therefore nothing more than a binary calculator, being the simplest type of ALU, the one which allows to add two numbers of 1 bit each, which would be an operation which would be as follows:
Surgery | Result | Traction |
---|---|---|
0 + 0 | 0 | 0 |
0 + 1 | 1 | 0 |
1 + 0 | 1 | 0 |
1 + 1 | 1 | 1 |
If you look at these, these are the specifications of an OR type logic gate, but we find a problem that is to be brought up when adding 1 + 1, since the result of adding 1 + 1 binary is 10 and not 1. So we must take into account that 1 in the carry that we carry and therefore a simple OR gate is not enough, especially if we want to work with a much higher precision in bits and therefore have a ALU much more complex in number of bits.
Binary subtraction in an ALU
Subtraction or subtraction can be derived with the following formula:
A – B = A + NO (B) +1
The trick here is very simple, it is based on the fact that we are working with binary integers. It does not work with floating point numbers. We can use the same mechanism that we used to add two numbers to perform the subtraction operation. All we have to do is reverse the value of the second operating through a series of NOT gates and add 1 to the final result. Thanks to them, we can use the same material to perform addition operation to perform subtraction.
Binary multiplication and division into powers of 2
The simplest form of multiplication in a binary system is multiplication by numbers multiple of 2, being a binary system we only have to implement a mechanism where the input data is shifted by several positions to to the left, if we are multiplying, or to the left. to the right, if we are dividing. The number of posts? It depends on the power index of 2 of the multiplier, so if we multiply by 8 which is 2 ^ 3 then we will have to shift the number 3 places to the left and if it divides 3 places to the right. It is for this reason that ALUs also incorporate bit shift operations, which are the basis for multiplying or dividing by multiples of 2.
But if we are talking about multiplying other types of numbers, it is better to go back to when we were little in school.
Multiplication with non-power of 2 numbers
For many years, ALUs were very simple and could only add up, as they did not have ALUs intended for multiplication. How did they behave then? Well, run multiple concatenated sums that took them many cycles. As a historical curiosity, one of the first domestic processors to have a multiplication unit was the Intel 8086.
Suppose we wanted to multiply 25 x 25, when we were little what we did was the following:
- We first multiply 25 x 5 and write down the result, which is 125.
- Second, we multiply 25 x 2, which gives us 50, and we write the result but shifting one position to the left.
- We add the two numbers, since we have the second number shifted to the left, the result of the sum is not 175, but 625, which is the result of multiplying 25 x 25 in decimal.
Well, in binary the process is the same, but the number 25 in this case is 11001 and therefore a 5 bit number. So binary, we’re going to multiply 11001 x 11001 and for that we’re going to have to use AND gates.
- First, we multiply 11001 x 1 = 11001
- Second, we multiply 11001 x 0 = 0000, we write the result one place to the left.
- Third, we multiply 11001 x 0 = 0000, we write the result two places to the left.
- Fourth, we multiply 11001 x 1 = 11001, we write the result three places to the left
- Fifth, we multiply 11001 x 1 = 11001, we write the result four places to the left
- Taking into account the position of each operation we add the result, which must not give as a result 01001110001
More complex mathematical operations
With the above, you can build units to perform much more complex mathematical operations such as divisions, square roots, powers, etc. The more complex the operation, obviously, more transistors will be needed. In fact, for each operation there is a different mechanism and when the control unit tells an ALU what type of operation to perform, then it tells it to use that specific mechanism for that specific mathematical operation.
The important thing being to save on transistors, the most complex operations are defined as a succession of the simplest in order to reuse the equipment. This leads to those more complex operations requiring a higher number of clock cycles. Although in some designs complete mechanisms are implemented which allow these operations to be performed in a much smaller number of cycles and even in a single cycle in many cases, but they are not common in processors.
Where they are used in GPUs, where we see a type of unit called Special Function Unit which is responsible for performing what we call transcendental operations such as trigonometric ratios used in geometry.
Where does the ALU get the data to operate?
First of all, it should be borne in mind that an ALU does not work with data in memory, but in the process of capturing and decoding the data with which it has to work is stored in a register called an accumulator, on which the operations.
In some more complex systems, several registers are used for arithmetic operations and, in some cases, even special registers for certain instructions. Which are documented most of the time, but in other cases, because they are only used in certain instructions, they are usually not documented.
The reason for using the registers is due to their proximity to the ALU, if RAM memory is used, it would take much longer to perform a simple operation. The other reason is that a lot more energy would be consumed to perform an operation.
With all of this it is explained how an ALU works, at least in basic terms.
Table of Contents