A processor, whether it is a CPU or a GPU, only does process data, which leads to need a souvenir to feed it. Unfortunately, over time, the distance between memory and CPU speed grew larger, which led to the implementation of techniques such as caching. We can’t forget the latency between processor and memory, which occurs when the interface between RAM and processor cannot tune or change data quickly enough.
However, we cannot measure performance in a general way, because each program or rather each algorithm within each program has a different computational load. And this is where the term arithmetic intensity comes in. But let’s see what it is and what it is and other things related to performance on a computer.
What is arithmetic intensity?
Arithmetic density is a measure of performance, measuring the number of floating point operations that a processor performs on a specific section of code. To get it, the number of floating point operations is divided by the number of bytes that the algorithm uses to execute.
How useful is it? Well, the fact that it allows in certain areas of computing where very powerful computers are needed for specific tasks to be able to have the best possible hardware system to run the algorithms under the best conditions. This model is mainly used in scientific computing. Although it also serves to optimize performance in closed systems such as video game consoles.
In the case of using a highly parallelized hardware architecture, a high arithmetic intensity is required, that is to say a low ratio between bandwidth and computing capacity from the moment when the ratio between computing capacity of said processors and the available memory bandwidth is high. Since it is required in numerous applications and in particular in graphics that a calculation be processed several times and therefore a great deal of calculation power is required in comparison.
Performance of the algorithm and relation to arithmetic intensity
When writing an algorithm, programmers take into account the performance of the algorithms they write in their programs, which is measured by the Big O notation, which measures the average of operations over data. Big O notation is not measured using any benchmarks, but programmers calculate them by hand to get a rough idea of the workload of programs.
- O (1): andThe algorithm does not depend on the size of the data to be processed. An algorithm with an O (1) performance is considered to have ideal performance and is unbeatable.
- On): the execution time is directly proportional to the size of the data, the performance increases linearly. It may also be that a
- O (log n): This happens in algorithms that typically split and solve a problem by part, such as data sorting algorithms or binary searches.
- O (n log n): It is an evolution of the previous one, it is a question of further dividing the resolution of the different parts.
- On .)2): there are algorithms that iterate multiple times because they have to query the data multiple times. Therefore, they are usually very repetitive algorithms and therefore have an exponential computational load.
- On!): an algorithm that follows this complexity is a completely flawed algorithm in terms of performance and requires rewriting.
Not all algorithms can achieve complexity level O (1), and some of them perform much better on one type of hardware than another. This is why domain-specific accelerators or processors have been developed in recent years to speed up one type of algorithm over others. The general idea is to divide the algorithms into parts and to process each of them with the processing unit most suited to its arithmetic intensity.
Relationship between communication and IT
The reverse case is the ratio between communication and computation, which is measured inversely with arithmetic intensity and is therefore obtained by dividing the number of bytes by the power in floating point operations. It is therefore used to measure the bandwidth required to execute this part of the code. The problem when measuring is that the data is not always in the same place and therefore the RAM bandwidth is used as a reference.
It must be taken into account that this is not a completely reliable measurement, not only because the cache system brings the data closer to the processor, but also because there is the phenomenon of latency where each type of memory RAM used has different advantages and disadvantages and the result may vary depending on the type of memory used.
Today, when choosing memory in a system, not only bandwidth is considered, but also power consumption, because the energy cost of moving data exceeds the cost of processing it. You therefore opt for certain types of specific memories in certain applications. Of course, still within the limits of the costs associated with building a system and they are not the same in a supercomputer as in a home PC.