It would be ideal for the same graphical architecture to be scalable from the simplest smartphone to the most advanced workstation, but unfortunately this is not the case.
Graphic architectures for smartphones like PowerVR, Mali, Adreno and many others. They don’t progressively change in performance and it gets to the point where they lose their power efficiency if we increase them for the purpose of placing them on a PC. On the other hand, PC GPUs like NVIDIA’s GeForce and AMD’s Radeon shrink very badly and are not viable in devices with lower power consumption than a conventional PC such as PostPC devices.
If you’ve ever wondered why we don’t see Radeon on a smartphone or PowerVR on high-end gaming PCs, we hope that after reading this article you have at least answered most of that question.
Performance per watt or how to do more with less energy
Performance per watt is used as a metric in any computer architecture to talk about the energetic efficiency of the same, that is to say his computing power under a given power consumption. This is done with a simple division between the power obtained and the consumption at the same time.If we want to compare two GPUs, we just have to make sure that both consume the same amount of power, i.e. same watts and compare the two running the same program.
Performance per watt is today the greatest obsession of architects when designing new equipment, the reason is none other than being able to achieve the expected power with less energy consumption, which has a series of consequences. positive, among which those that stand out:
- They may consider selling a much faster version of the chip than what they initially offered, this will be a business advantage over the competition.
- It is easier to achieve target speed in post-manufacturing testing.
- This allows for form factors beyond the usual, NVIDIA’s higher performance per watt with Maxwell and Pascal compared to AMD Polaris and Vega was what made gaming laptops look like with NVIDIA GPUs. and instead they didn’t look like AMD with these architectures. .
- It allows you to create new and improved iterations of an architecture to be released later, which will be faster.
The most common way to increase performance per watt is to new manufacturing nodes, each new node results in a reduction in consumption which results in a higher clock speed with the same consumption or in a lower consumption with the same clock speed.
But where the changes are really being made is in terms of architecture, since architects make optimizations so that new designs consume less power when executing different instructions, both when designing a CPU and GPU.
Even in a minor iteration of an architecture, engineers are always thinking of new ways to improve the performance per watt of that architecture, once the first version of a design is completed, that doesn’t mean they’ll do it. forget but throughout its commercial life. It will receive optimizations and improvements in order to be more efficient without increasing energy consumption and, if possible, reducing it.
The importance of keeping data close to the GPU
At a conference, Bill Dally, who is currently the chief scientist at NVIDIA, posted an anecdote in which he commented that NVIDIA engineers were confused by a simple question he asked them. Did you measure the energy consumption at the time of move data?
But, the data is not only processed, but is also moved by the whole processor and depending on the location of this data, the energy consumed will be more or less. This is why designs attempt to ensure that the greatest number of instructions work with data within the processor, the idea is to reduce the power consumption of these operations.
Generation after generation, GPUs are more and more optimized to avoid calculations on VRAM as much as possible and the architectural improvements made in recent years have had this objective: keep data as local as possible to reduce the power consumption of the instructions.
big.LITTLE on GPUs in the future to increase performance per watt
Although he’s finished big little At the moment this only affects processors, in the case of GPUs there is no doubt that we will see this concept emerge in architectures in the short term, because at the end of the day, computing units are still processors. in themselves. More focused on thread level parallelism than instruction, but the big.LITTLE can also be applied in the case of GPUs.
What is? ,
- Simpler instructions consume much less than complex instructions.
- The clock speed of a processor is traditionally limited by the clock speed of the instructions that consume the most power.
- The computing units will use two types of cores: one for very simple instructions that can achieve higher clock speeds taking advantage of the lower power consumption, the other for more complex instructions, which would reach speeds of weaker clock.
The idea is to take advantage of performance per watt higher than the simpler kernels
Performance per watt and fixed function units
Micro-wired units are units that do not execute a program, but from certain input data they always execute the same instructions.
Its usefulness is for performing repetitive and recursive tasks. An example of this are the texture filtering units, it is true that it would be possible to do this with the units that run the shaders, but this series of operations would end up consuming several orders of magnitude more.
The last unit included in GPUs is the RT Core or Ray Accelerator Unit, which frees up the shaders from calculating the intersection of the rays in Ray Tracing, a task which to perform at the same speed would require enormous power consumption. it would even exceed what a GPU can afford.
The memory chosen also affects performance per watt
All types of memory do not have the same energy efficiency, some will consume more than others and this is one of the controversies of recent years between GDDR type and HBM type memories.
- GDDRs are very cheap, but in return they consume a lot more, leaving less power for the GPU, which achieves much lower clock speeds by having less power for itself.
- HBMs take up very little space and consume very little, but are very expensive to implement.
In the domestic market, GDDRs have dominated for years, but in markets where the cost is not that great, HBMs dominate because their low power consumption when transferring data makes them ideal for getting the best possible performance. .
It will reach the point where GDDR type memories will not be able to achieve more, the problem is that memory with the advantages of HBM and which is quite cheap has not yet appeared.
Table of Contents