One of the biggest issues with the multi-core CPU systems that our PCs use is that they are based on the Von Neumann model, i.e. there is only one shared memory sink. . As the number of threads, cores, threads, and other things that run in parallel in a processor increases. More and more conflicts are created between them. Not only in data access, but also in the information contained in the various memory addresses and therefore the value of the variables used by the programs. There are many methods to avoid these conflicts, one of them is transactional memory, which we will describe in this article.
An introduction to the problem
When writing a program, it is encoded in a series of instructions which are apparently executed sequentially. But already with the parallelism of instructions with a single running kernel, different execution units can enter. To this, we must take into account that out of order execution adds the complexity that accessing memory and data at run time is out of order.
When there is a large number of requests, it ends up creating a contention to access the same memory. This causes requests to be delayed longer and longer, increasing memory latency with the processor on certain instructions and affecting bandwidth. For this, there are mechanisms which avoid these memory access conflicts as much as possible, so that the processes access the memory from the ordered memory. This avoids conflicts when modifying data in its hierarchy, as well as reducing contention issues and therefore access latency.
The easiest way to do this is to use locks, which are sections of code where we mark that they should not be executed simultaneously by different threads of the processor. That is, only one core of it can be responsible for this part of the code. We have therefore made a lock on the rest of the cores and the rest will only be able to enter execution when the instruction that ends the lock is reached. Which will happen when the part of the code isolated from all but one cores is finished.
What is transactional memory?
One method to avoid the problems described in the previous section is to use transactional memory. Which isn’t a type of memory or storage, so we’re not talking about a pure piece of hardware. Its origin is found in database transactions, it is a type of instructions executed in Load-Store units.
The transaction system in a processor works as follows:
- A copy of the part of memory that multiple cores want to access is created, one for each instance.
- Each instance modifies its private copy independently of the rest of the private copies.
- If a piece of data has been modified in a private copy and not in the rest, then the modification is also copied in the rest of the private copies.
- If two instances modify the same data at the same time and this creates an inconsistency in the data, both private copies are deleted. and private copies of the rest are copied
The fourth point is important, because it is in this part that it becomes clear that it is necessary that this part of the code be serialized. This means that the other instances stop modifying their private copies and that the modifications are made by only one of the instances. When it ends, the changes are then copied to the rest of the private copies. When the part of the code marked as transactional has already been executed and all the private copies contain the same information, then the result is copied to the cache lines and the corresponding memory addresses.
Transactional memory systems, the Intel TSX
The acronym TSX, Transactional synchronization extensions, refer to a series of supplementary instructions to the x86 ISA, which are intended to add support for transactional memory to Intel processors. Therefore, it is a series of instructions and mechanisms associated with them that allow specific sections of code to be demarcated as transactional and allow the Intel processor to execute the process that we talked about in the previous process. But in this case, the Intel implementation is a bit more complex. Since, as we have seen previously, if there is a conflict between two data, the whole process is interrupted by one of the running instances.
Its hardware implementation is achieved by adding a new type of cache called transactional cache in which different operations are performed on different data. Keep in mind that what transactional memory seeks to reduce conflicts when accessing memory. Although the caches support a greater number of requests than RAM in general, these are also limited and in particular at the levels furthest from the cores. All this is combined with the use of internal memories and private registers which serve as support for the private copies executed by the various cores.
The Intel TSX instructions are not a complex set, we have on the one hand the XBEGIN instruction which marks us when a transactional section of memory begins, the XEND instruction which marks the end and the XABORT, which is used to mark a exit from the process when an exceptional situation arises.
The end of Intel TSX instructions?
Today, CPU control units are actually full microcontrollers, this means that the way it decodes instructions and the list of instructions can be updated. Intel made the first implementation on the Haswell architecture and it has remained in Intel processors until now. Since it was recently disabled via firmware on the sixth, seventh and eighth generation Intel cores.
From time to time, Intel performs remote updates of its processors, which are done through the Intel management engine that we have in our PC without our knowledge. They are usually not common but can include optimizations in the execution of some instructions or even the elimination of support for others. The elimination of the Intel TSX in the Intel Core is due to the fact that with the latest changes in the internal microcode of the control unit, it implies a conflict in the operation of the software, which means that the CPU does not not working as it should.
But the real reason is that the Intel TSX allows malicious code to run under the radar of traditional security systems, especially the one that affects the operating system. Because private copies do not correspond to the user’s environment or the operating system. So this is still a problem similar to that of speculative execution.
Table of Contents