If you’ve seen a diagram of a multi-core processor, regardless of designer, manufacturer, and even type of system, you will always have stumbled upon the diagram:
The reason is related to what we call cache consistency.
To understand the concept, we have to imagine the following: Suppose there are several people each in front of a terminal, all editing the same file found on the server. If one of these people changes part of the document, it is reflected on the screen of the rest of the editors.
But for that, a system is needed so that whenever a change is made to the document, it transmits the information to the rest of the screens so that the change can be seen in real time. But one day this system fails and suddenly everything collapses, despite the fact that their eyes are editing the same document, in reality everyone has their version of it and any changes to the document will be independent of what the others are doing.
Apply this to a computer program and we have absolute chaos, where a small change made by one kernel in RAM that the rest of the world is unaware of can lead to critical system failure or the execution of bad data.
Cache hierarchy and consistency
Caches store a copy of a portion of the RAM, closest to the code that is currently being executed by the CPU, so caches do not have an updated version of the contents of memory but a copy . Think of it like the terminal screens of the comparison we talked about above.
But in multicore systems, an extra level of cache is added, which is between the memory controller and the previous levels of cache, which they all connect to.
This is because having multiple cores accessing RAM at the same time would lead to a series of issues and conflicts, using a last-level global cache results in it being only necessary to upgrade. update the row in this cache so that the caches closest to the processor are updated as they have copies of the sections of the caches at later levels.
so the cores access memory one by one and the caches are used to reduce the number of accesses to them. Of course, if there is a copy of a row of RAM in the caches and it is modified, then there must be a mechanism that updates the copies of that row of memory in the rest of the cores of the processor and even in the memory itself. .
If the last level shared cache did not exist, then each of the cores when checking the consistency of the contents of their caches with RAM should access the RAM multiple times continuously, with the last level shared cache This issue is partially avoided because, although not all cache content at the level furthest from the processor is in the caches closest to the processor, those at the levels farthest from the processor keep a copy of those that are. closer.
So, in order to maintain the consistency of the cache, it is much less expensive in terms of design not to change all the rows of the different private caches of each core, but rather to update the last level cache to update the rest. of the caches closest to it. processor.
The cache update process
But what if two cores want to access the same data? This is where two methods appear.
- The first method is to invalidate all writes to copies of the same row of memory in the rest of the caches except the kernel which is currently using it.
- The second method is to make sure that when a kernel changes a row in the cache, the copies of that same row in the rest of the caches in the CPU are also changed.
s is cached, the second bit indicates whether the contents of this cache line and that of the RAM memory from which it is supposed to be copied are identical.
So when the first kernel writes to its cache line and changes the contents of the line it sets the memory consistency bit, when this happens each cache line pointing to that memory address is marked as “reserved” , then the contents of the memory row to which all these caches point is modified with the new contents.
Cache hierarchy and consistency
As checking all the cache levels of the CPU (or GPU) with the memory would be a titanic job which would make the processor extremely slow and complex, the check between the consistency of the contents of the cache and the RAM is done between the last level of CPU cache. (or GPU) and memory. This is done because the caches of the previous levels and therefore closer to the processor are not connected directly to the RAM but to the next cache level, so that the consistency of the content is not checked against the RAM but against to the next level of cache.
Keep in mind that the contents of the different levels of the cache of a processor resemble that of a Russian doll. If, for example, we have a three-level cache, the third level will have your content and the first two levels, the second level will have your content, and the first level will only have yours. This is why the cache-memory consistency system only needs to change the last level of the cache, because any change in this will be a change in the previous levels.