However, roadmaps are often partially misleading or change in the middle of them, either in the form of product cancellations and overruns of others. That’s why we decided to not only review the Intel GPU roadmap, but we tried to reconcile it with the various information coming our way.
The Intel ARC GPU roadmap and its architecture
Intel’s ARC Alchemist architecture doesn’t differ much from what the competition offers. Just like with AMD and NVIDIA, we use different signifiers for units with the same or similar functionality. So we’re going to be concise, we’re going to focus on what’s important.
Intel’s objective is to capture as much market share as possible and for that they are not going to take the jugular of their CPU rival, what they want is to go against NVIDIA and that is why their Alchemist are designed to compete head-to-head with the RTX 30. So if we compare unit by unit, we’ll find things like a very similar and superior Ray Tracing unit to AMD’s with the ability to traverse the BVH tree by equipment. The Tensor Core equivalent not found in Radeons is found in the ARC Alchemist and the number of ALUs per shader unit is 128 instead of 64.
So, the first generation Intel ARC is more of a first entry that says little or nothing about Intel’s ambition and future plans. A simple cover letter in a market hitherto dominated by the AMD and NVIDIA duopoly.
The development of Ponte Vecchio is essential
We know two things about the GPU for high-performance computing called the Ponte Vecchio: the first is that it won’t appear on PCs, since it’s a design for supercomputers and high-computing systems. performance. However, there are several concepts that we will see in the Intel GPU roadmap. Although the most important thing is that the knowledge accumulated for its development is what will allow them to deploy the next generations very quickly compared to the competition. According to the words of the chief architect, Raja Koduri, we can expect the use of the same unit in the CPU and the GPU.
Meteor Lake will be a new architecture that will integrate mosaic GPUs (or chiplets) into a 3D packaging. This is something very exciting that will allow us to offer the performance of dedicated graphics cards with the efficiency of integrated graphics.
One of the things that Ponte Vecchio uses is Intel’s new 3D encapsulation and silicon bridge technologies. We’re talking about Foveros and EMIB, which will be key to achieving the Intel ARC GPU roadmap using multiple tiles or chiplets instead of a monolithic chip or single part. They won’t be the only ones, but they have progressed more than the competition.
The Importance of TSMC’s 3nm Node in Intel’s Roadmap
The agreement between Intel and TSMC where the latter will build Tile GPUs for its graphics and processors will allow Pat Gelsinger’s team to take advantage of the Taiwanese 3nm node well before NVIDIA or AMD. The reason? The small size of each tile is essential to quickly deploy the different generations of ARC GPUs. However, the Tile GPU developed for the Ponte Vecchio is not powerful enough to compete with the RTX 4090 in terms of ALU count at FP32.
Intel has therefore decided to take advantage of its privileged access to the 3nm node to create a Tile GPU with higher computing power than the Ponte Vecchio in order to offer a high-end GPU with much more power than it can get. NVIDIA with the RTX 4090 Ti. To do this, they will mount the same Tile GPUs from Meteor Lake and Arrow Lake. The difference is that dedicated GPUs will use configurations of 2, 4 and who knows if 6 and even 8 tiles on the same GPU.
We can’t give official numbers, but there is talk of “320 EU” configurations per Tile GPU at 3nm in Intel’s roadmap, which translates to 2560 ALU FP32 in the 1-tile configuration only, this which would allow Intel to have a GPU with more than 20,000 “cores” at the high end; however, at this time we don’t know if we’ll see her as Battlemage or Celestial. In any case, the name is the least important of all.
How does Intel plan to combine several different GPUs into one?
Here we enter an extremely interesting subject, desktop GPUs generally draw with only one list of screens per frame, so if we use several, we have three solutions:
- Alternate frames, which means that the CPU should have the screen list of the next frames already prepared. It gets to the point where you can’t do that and you can’t scale on GPUs.
- By dividing the frame into several parts of the screen, the problem is that all the generation of the scene until the pixelation is not done at the level of the screen, but at the level of the geometry of the world, so it is done by a single GPU.
Intel’s idea for its future GPUs is easy to understand, first the scene is rendered with a single Tile GPU, but without applying shader programs to any graphics primitives and textures to know where each polygon of the scene from the beginning, knowing which ones will not be visible and will have to be ignored, in particular to create lists of screens to render the scene in order to target each location of it and thus make each tGPU render its own part from the scene.
It is neither more nor less about adopting the same solutions as Tile Rendering, but with the difference that the scheduling of the geometry is done before the final rendering of the scene. Pre-rendering to create the roster is done through the compute pipeline, allowing multiple Tile GPUs to be used in parallel from the start.
Table of Contents