Microsoft details Maia 100, its first custom AI chip

A few months ago, Microsoft announced the arrival of Copilot+, a set of AI features for Windows 11. The company’s commitment doesn’t stop at implementing elements for its latest operating system, it goes further. During Hot Chip 2024, Microsoft presented the details of Maia 100its first specific chip for AI.

To use Copilot+ features in Windows 11, a processor with NPU cores is required. This means, for a large number of users, the “obligation” to update their computer if they want to use AI.

What’s interesting is that these NPU cores really don’t make sense. In fact, GPUs can do this much better, thanks to their ability to parallelize tasks. This suggests that NPUs are a forced obsolescence element for a technology that is, to say the least, quite green.

Microsoft announces its first AI-specific processor

During the Hot Chips 2024 eventMicrosoft has given all the details about its Maia 100 Processor. This one was designed to work seamlessly from start to finish in the field of artificial intelligence. The idea with this solution is to get the best performance and keep costs as low as possible.

The system includes specially designed server boards, unique racks and a software system. The idea is to offer a highly efficient and robust platform for cutting-edge AI services, such as Azure OpenAI.

Maia 100 is the largest chip manufactured by TSMC in the 5nm node. A specific solution designed for intensive AI tasks within the Azure platform. Concretely, this Maia 100 chip has the following specifications:

DIE size of 820 mm².
700 watt TDP design.
TDP power of 500 watts.
Manufactured using TSMC’s N5 process with COWOS-S interposer technology.
A capacity of 64 GB of HBM2E memory with a bandwidth of 1.8 TB/s.
6-bit dense peak tensor with power of 3 POP, 9-bit with power of 1.5 POP and BF16 with power of 0.8 POP.
It offers a back-end network bandwidth of 600 GB/s (12 x 400 GBE).
Supports PCIe Gen5 x8 host bandwidth of 32Gb/s.

It features high-speed tensor units (16xRx16) to provide high-speed processing for training and inference. Additionally, it supports a large amount of data, including low-precision data types in MX data format.

It integrates a vector processor, which is a loosely coupled superscale engine built as an instruction set architecture (ISA). It has been customized to support a wide range of data, including FP32 and BF16.

A direct memory access engine has been provided that supports different tensor fragmentation schemes. It also includes hardware semaphores that enable asynchronous programming within Maia.

What is interesting about this chip is that it integrates a network connection solution based on Ethernet but with a special protocol similar to RoCE. Thanks to this system, very fast data processing is possible. It can handle up to 4,800 Gbit/s for some data operations and up to 1,200 Gbit/s for all-to-all communications.

They also created a software development kit so that programmers can adapt their PyTorch and Triton models for Maia. Various tools are included in the SDK to simplify the use of these models with Azure OpenAI Services.