UMA, NUMA and the differences between the two

When designing a system, one of the first things that is placed on the design table is how the RAM will be organized, as this will indicate not only what the architecture of the system will look like, but also its performance. , its manufacturing cost and its extensibility.

RAM memory organization: UMA

UMA stands for Uniform Memory Access and refers to all systems where RAM is a single shared sink in terms of access by the CPU and the rest of the processors in the system. This type of configuration is mostly used in today’s SoCs, where the different components share memory access.

The UMA system is also the one used in video game consoles, in general it is the memory system used in every system where its components are mounted on a common board, where the routing of two different types of memory sinks is a complication on the roads. and the lines of communication that cross the plate.

So this is the easiest way to build a memory system in any type of computer, but it leads to a series of problems such as the fact that sharing memory access ends up creating an effect of contention, in which a “waiting list” is created for accessing data, which can only be alleviated with the use of types of RAM memory with different access channels.

RAM memory organization: NUMA

The organization of NUMA or Non-Uniform Memory Access memory. Refers to systems in which several different memory sinks are used in the same system. This is the case of the PC where, for example, we can see how the graphics cards have their own memory different from the main RAM of the system.

NUMA systems do not suffer from the contention problem in the memory access of UMA systems, but in fact, in order to communicate the different components of the system with each other, the result is a very complex system. The reason is that each of the components must have main RAM access mechanisms to communicate with the CPU, for example GPUs have DMA units which allow them to access the main RAM of the system and do copies of some data from RAM to VRAM.

This type of memory organization is used when we want to create a system with expansion capabilities, for them it is necessary to create so-called expansion ports, which are used for communication of the system processor with the RAM memory systems of the system itself. each component that is part of the system.

Addressing vs physical organization

One of the ideals of the PC is the totally coherent memory system, in which the addressing of the various components it contains is common to all. This means that if we change the address, say, for example, F4. Then, all components when going to memory address F4, any other component of the PC should refer to the same memory address.

One might think from the outset that because UMA systems always have their shared memory at the physical level, so will the addressing level, since we are talking about the same memory pool at the physical level. The reality is very different, because it is necessary that the various components are coherent in terms of memory, which means that by taking the preceding example if one writes the value 30 in the address F4 then all the components know that it there is a value of 30 there.

The way to ensure that all the components of an SoC are completely coherent is therefore not to use the same memory controller, but to add a last level of cache just before said controller, which would be beyond the CPU, from the GPU. And other components and would be considered by everyone as a last level of cache.

Adding a last-level cache before the memory controller is typical of PostPC systems, since all were designed for SoCs from the start, there are no programs that make copies of data from a space to another. On PC, on the other hand, this is not common and although Intel and AMD have been launching SoCs for years where all components are unified in a single chip, access to the different elements of the SoC is not and parts of RAM are. isolated exclusively for a specific component. For example, when we have a built-in graphics and we allocate an amount of memory for said graphics, what we are doing is telling the CPU that its space cannot touch it because it is out of its allocation.