One of the problems with tech specs is that they tend to give data that works 100% under perfect conditions. In the case of memory, this does not happen because not all data has the same latency and the bandwidth is never at 100%.
More bandwidth doesn’t mean less latency
By latency between a processing unit and its associated memory is meant the time it takes to receive requested information or to receive the signal that a change has been made in the memory. Latency is therefore really a way of measuring time.
Bandwidth is the amount of data transmitted every second, so it’s a rate of speed. So, by straightforward logic, we may come to think that at a higher speed when we are looking for data, the processor, GPU or any other processing unit will get the data in less time.
The reality is that this is not the case, moreover, there is the peculiarity that the more bandwidth a memory has, then it generally has more latency compared to the others. This phenomenon has an explanation, which is what we will explain to you in the following sections of this article.
Searching for data adds latency
Almost all processing units today have a hierarchy of caches, in which the processor will first ask each of them before accessing RAM. This is because the direct latency between the processor and the RAM is large enough to cause a loss of performance compared to the ideal processor.
Imagine you are looking for a specific product, the first thing you do is look in the local store, then in a slightly larger store and finally in a department store. the visit of each establishment is not done immediately, but requires travel time. The same thing happens in the cache hierarchy, this is called a “cache miss”, so we can summarize the time as follows:
Search time = First cache search time + cache shortage period +… last cache search time.
If the cache search time is longer than the time it takes to access the main RAM, then the cache system will be poorly designed on a processor because it defeats the purpose for which the cache would have been created. .
Now the latency issue is more complex, because to the access time added by searching cache, we have to add the latency which is added to search the data in RAM if it is not found in RAM. What problems can we encounter? Well, for example, all memory channels are busy and a conflict is created, which happens when RAM has occupied memory channels and is receiving or providing other data.
How Does Latency Affect Bandwidth?
As the graph shows, latency is not the same for all memory bandwidth.
- Constant region: Latency remains constant at 40% of sustained bandwidth.
- Linear region: Between 40 and 80% of the sustained bandwidth, the latency increases linearly. This happens due to the fact that there is an over-saturation of requests in memory which have accumulated at the end due to a conflict.
- Exponential region: In the last 20% of the bandwidth segment, the data latency increases exponentially, all memory requests that could not be resolved in the previous period accumulate in this part, creating conflicts between them.
This phenomenon has a very simple explanation, the first memory requests answered are the ones found first, most of them are in the cache when there is a copy, but the ones that are not in the cache. accumulate. One of the difference between caches and RAM is that the former can support multiple concurrent accesses, but when searching for data occurs in RAM, the latency is much higher.
We tend to think of RAM as some kind of torrent of water in which data doesn’t stop flowing at the specified speed, whereas RAM isn’t really going to move data unless it is. have a request. In other words, latency affects throughput and therefore bandwidth.
Ways to reduce latency
Once we know that data access conflicts create latency and this affects bandwidth, we need to think about solutions. The clearest thing is the fact of increasing the number of memory channels with RAM, this is precisely one of the keys by which HBM memory has a lower access latency than GDDR6, since 8 memory channels allow less contention than ‘with 2 channels of GDDR6.
The best way to reduce latency would be to create memory as close to the processor as a cache, but it is impossible to create RAM with enough storage capacity to be fully functional. You can place a memory chip and connect it via TSV, but since the memory is so close to avoid thermal drowning and with it the effective bandwidth.
In this case, since latency affects bandwidth, due to the proximity between memory and processor, then the effect of latency on memory would be much smaller. The trade-off of implementing a CPU or GPU with 3DIC? This would double the cost of the PC and the more complex manufacturing process would cause fewer units to reach us, hence more scarcity and therefore even more expensive prices.
Table of Contents