Here is Samsung's HBM-PIM to accelerate artificial intelligence

The first thing to understand, at the time of writing this article, is that the HBM-PIM is not a standard approved by the JEDEC, which is the committee of 300 companies in charge of creating the various memory standards, whether volatile. or persistent. . At the moment, this is a proposal and a design by Samsung, which could be converted into a new type of HBM memory and be manufactured by third parties or, failing that, into an exclusive product of the southern foundry. -Korean.

Whether or not it becomes a standard, the HBM-PIM will be made for the Alveo AI accelerator from Xilinx, a company remembered as having been bought out entirely by AMD. So this is not a concept on paper and not a lab product, but this type of HBM memory can be manufactured in large quantities. Of course, the Xilinx Álveo is an FPGA-based accelerator board that is used in data centers. This is not a mass market product, and we have to keep in mind that this is only a variant of HBM memory, which in itself is very expensive and rare to manufacture, this which reduces its use in commercial products such as gaming graphics cards or processors.

The concept of in-memory computing

The programs we run on our PCs run on a marriage of RAM and CPU, which would be perfect if we could put both on a single chip. Unfortunately, this is not possible and leads to a series of bottlenecks inherent in the architecture of any computer, a product of the latency between system memory and the CPU:

Since there is a greater distance, the data is transmitted more slowly.
The energy consumption increases all the more as there is space between the processing unit which executes the program and the storage unit where the program is located. This means that the transfer speed or bandwidth is lower than the processing speed.
The usual way to solve this problem is to add a cache hierarchy on the CPU, GPU, or APU; which copies the data from the RAM inside for faster access to the necessary information.
Other architectures use what is called Scratchpad RAM, which is called onboard RAM, it does not work automatically and its contents must be controlled by the program.

The RAM integrated into the processor therefore has a problem and that is its capacity, where it stores very little data inside due to the limitations of physical space since the vast majority of transistors are dedicated to the processing of instructions and not to the storage.

The concept of in-memory computation works the other way around compared to on-board DRAM or SRAM since we are talking about RAM to which we add logic where bit cells have more weight. It is therefore not a question of integrating a complex processor, but a domain-specific processor and even hard-wired or fixed-function accelerators.

And what are the advantages of this type of memory? When a program is executed on any processor at least for each instruction, the RAM allocated to said CPU or GPU is accessed. The idea of in-memory computation is none other than to have a program stored in the PIM memory and the CPU or GPU only has to use a single call instruction and wait for the processing unit in memory computation executes the program and returns the final response to the CPU, which is free for other tasks.

The processor of the Samsung HBM-PIM

A small processor was built into each of the chips in the stack of an HBM-PIM chip, so the storage capacity of is affected by directing transistors that would go to the memory cells to assign them to the logic gates that make up the integrated system. processor and as we have advanced before, it is very simple.

It doesn’t use any known ISAs, but its own with very few instructions in total: 9.
It has two sets of 16 floating point units with 16-bit precision each. The first set has the ability to perform addition and the second to perform multiplication.
A SIMD-type execution unit is therefore a vector processor.
Its arithmetic abilities are: A + B, A * B, (A + B) * C and (A * C) + B.
The power consumption per operation is 70% lower than if the CPU were doing the same task, here we have to take into account the relationship between power consumption and distance to data.
Samsung baptized this small processor under the name of PCU.
Each processor can only operate with the memory chip of which it is part, or with the entire stack. The units of the HBM-PIM can also work together to speed up algorithms or programs that require it.

As can be deduced from its simplicity, it is not suitable for the execution of complex programs. In return, Samsung is promoting it under the idea that we tie it together as a unit that speeds up machine learning algorithms, but it also cannot handle complex systems because it is a vector processor and not a tensor. Their capabilities in this area are therefore very limited and focus on things that do not require a lot of power, such as speech recognition, text and audio translation, etc. Let’s not forget that its computing capacity is 1.2 TFLOPS.

Are we going to see the HBM-PIM on our PCs?

The applications that Samsung gives as an example of the advantages of HBM-PIM are already accelerated to a higher speed by other components of our PCs, moreover, the high cost of manufacturing this type of memory already excludes its use in a computer. of family. In case you are programmers specializing in artificial intelligence, the safest thing is that you have hardware in your computers with a much higher processing capacity than Samsung’s HBM-PIM.

The reality is that it seems like a bad choice for the marketing department of the South Korean giant to talk about AI. And yes, we do take into account that this is the buzzword on everyone’s lips, but we believe that the HBM-PIM has other markets where it can exploit its capabilities.

What are these applications? For example, it serves to speed up the search for information in large databases that hundreds of businesses use daily and we believe this is a huge market that moves millions of dollars a year. In any case, we do not see its use at the domestic level and in scientific computing, although it is possible that the still unfinished HBM3 inherits some of the ideas of the HBM-PIM.