The reveal is consistent with Intel’s official documentation version 39 of the ISA extensions reference, where the user InstLatX64 he has been getting it. In this document there is a section detailing the newly installed set of commands and features of the new Intel 64 engineers through IA-32, where each set is provided with a series of architectures to support them.
New in an interesting set of features that greatly affect the cache and its different levels and will be supported Tremont, Alder Lake and Sapphire Rapids, when we may have to increase the RAM frequency to get more CPU performance.
Intel CLDEMOTE, a new set of cache extension commands
As Intel has already described, in its new build we will have a series of changes that will improve performance and remove the IPC of its workforce. It looks like one of these features has been on the list for a long time and since it’s “hidden” among hundreds of Whitepaper, no one has ever noticed its presence, since preliminary data seem to confirm that it’s been in full view since October 2018.
In any case, until now it would not be possible to assign one or more specific constructions, so we already know when to arrive in the market according to the latest traffic from Intel itself. A specific set is called CPLEMOTE and it comes as an upgrade to what’s called a “cache hint statement.”
It looks like an important change in Intel’s vision for the design of its orders and will have a significant impact on performance. Its primary function is very easy to understand, but very difficult to use when running: it will attempt to move certain cache data cores from the closest cores to the ones that are the best.
In other words, it will try to extend the functionality from cache to cache by moving data to L3 instead of L2 or L1, developing the last two entries by coral.
The controls, advantages and disadvantages of this set of instructions
As with any set of instructions, a series of methods are required to ensure its effectiveness:
- It takes a inventory sales that justifies its justification.
- The write cache to variable memory.
This will obviously create a small problem if we want to extract high performance from the set: it widens the existing working gap between processor and memory, making the former largely dependent on the performance of its cache.
Conversely, by moving cache lines much smaller than levels L1 and L2 at L3 greater load movement between the loops is available and this is more efficient, since L3s are distributed.
In addition, from a cache management perspective, the complexity increases and with this the cost of processors can increase due to their high R&D costs. Ultimately, the RAM access times and its data entry speed will be much more decisive than before.
We can see performance jumps similar to those seen with Ryzen, where the frequency and intensity of RAM are important for system power. The advantage is that there would be no access problems for AMD, but it would also require very fast memories to improve the benefits of this set of commands, which could lead to a huge jump in BMI performance on Intel’s part.