The first CPU PC to have a cache was the Intel 80486, since then we have seen how all processors fit the cache into a multilevel hierarchy, and not just in the case of CPUs, but GPUs as well.
The main utilities are known to everyone. Firstly, reduce the huge latency between processor and RAM, secondly the power consumption of each of the different instructions, and thirdly, reduce memory access conflicts, which can lead to higher latency, especially if the ‘we are talking about several cores accessing the same memory address.
However, there is a way to measure cache performance, which is taken into account when designing new processors. Read on to find out how cache affects your CPU and GPU performance.
Cache summary
First of all, we have to keep in mind that the cache is not a space addressable by the CPU or GPU, when we talk about addressable space we mean that the CPU or GPU can point to an address specific memory where the next data or instruction proceed. The cache therefore cannot be pointed by the CPU and therefore cannot be managed by it.
We can say that the cache mechanisms work automatically and when a processor core performs a search, it goes through the different caches to find the specific data. When a kernel makes a change, all copies of this data in the rest of the caches are automatically updated as well. Likewise, it is the caching mechanisms themselves that decide which copies of the contents of RAM are kept in the cache and which are not.
And what the cache does is store copies in memory that are closest to the lines of code that are currently being executed. This is because the code is primarily sequential, so most of the time the next line in the program to process will be the one immediately following.
How is cache performance measured?
When the CPU or GPU needs to access data in memory, the first thing to do is access the different levels of the previous cache. So, it will search the top level cache first, if it does not find the data, it will move to the next level and so on until it finds the data it is looking for.
When the data is not in a cache level then what we call a “Cache Miss” is produced and it is therefore necessary to go down to the next level of the hierarchy. This means that there is additional time in terms of latency to add. On the other hand, if we have
To figure it out, let’s say we have a CPU or GPU with 3 levels of cache and found the data at the third level. So, the search time in this case can be summarized as follows:
Cached data search time = first-level cache search time + second-level cache jump time + second-level cache search time + third-level cache jump time + third-level cache search time on third level cache
It must be taken into account that if the cache search time exceeds the direct RAM search time then the design of the said processor cache will be poorly implemented, because in no logical sense it is not justified that more time is needed. to search for data in the cache than in memory. This is why we usually don’t see additional cache levels, as the different latency times added by each level add even more additional latency to the access time.
Measure AMAT
AMAT stands for Average Memory Access Time. This is an average because not all instructions on a CPU or GPU have the same latency and are not memory dependent in the same way. But at the same time, it helps us to measure the cache performance of a CPU or GPU.
To calculate the AMAT of any CPU or GPU the following formula is used:
AMAT = Hit Time + Miss Rate * Miss Penalty
What interests us is that AMAT is low, since it measures the CPU access time to data and therefore the latency when it comes to retrieving data. As for the different values of the AMAT formula to measure cache performance, they are as follows:
- The first value which is the Hit Time, which is the time that the CPU or GPU will take to find the data in the cache. In this case, it is important that the cache is small so that its journey can be made more quickly. Since the larger the cache, the longer it will take to travel. This is why the cache levels closest to the kernel are very small.
- The second value is the Miss Rate, which is the percentage of times the data is not in the cache. This contradicts the Hit Rate, since the best way to find cached data is to increase its storage capacity. The cache should also have mechanisms for knowing what data to keep inside, to give space to others who will have more short-term access by CPUs or GPUs.
- The third value is the Miss Penalty is the latency it takes to access data if it is in RAM and not cached. It’s a huge time when it comes to clock cycles. As it goes without saying, in the case where the data is in RAM and not in the cache, it is necessary to add the search time provided in the hierarchy of caches before RAM.
The performance of the cache will therefore depend on how the Hit Time or Miss Rate is optimized, because optimizing one section means damaging the other, architects must decide what value they place the most importance on when designing a new CPU or a new GPU. .