According to the publication, Google researchers were able to use a reinforcement learning technique to design the next generation of TPUs (Tensor treatment units), Google’s specialized artificial intelligence processors. This means that the Internet giant’s next AI chips will be designed by the company’s previous generation of AI chips.
Artificial intelligence that designs chips for AI
The use of software in chip design is not new, but according to Google researchers, the new reinforcement learning model “automatically generates chip floor plans that are superior or comparable to those produced by humans.” in all key metrics including power consumption, performance and chip area ”. And best of all, it does it in a fraction of the time it would take a human to do it.
The superiority of AI over human performance has attracted a lot of attention; one media outlet described it as “artificial intelligence software capable of designing chips faster than humans” and wrote that “a chip that would take humans months to design can be designed by AI from Google in less than six hours “. But on reading the Nature article, what is surprising is not the complexity of the system used to design the chips, but the synergy between humans and artificial intelligence.
Analogies, intuition and rewards
The article describes the problem as such: “Planning for chip design involves placing lists of networks on chip patterns (two-dimensional grids) so that performance metrics (such as power consumption, time , the area and length of connections) are optimized, while respecting strict restrictions on road density and congestion.
Basically what you want to do is position the components in the most optimal way and yet, like in any other problem as the number of internal components on the chip increases, finding optimal layouts becomes more difficult. This is precisely what AI seems to have fixed, or at least mitigated.
The existing software helps speed up the process of discovering the chip arrangement, but fails as the chip becomes more complex. The researchers decided to draw their experience from how reinforcement learning solved other complex problems, and it is the manifestation of one of the most important and complex aspects of human intelligence: the analogy.
Humans can extract abstractions from one problem we solve, and then apply them to other problems. While we take these skills for granted, it is these skills that make us very good at transfer learning, and that is why researchers were able to rethink the problem of chip design planning as if it is. was a board game.
Deep reinforcement learning models can be particularly effective at finding very large spaces, a feat that is physically impossible with the computing power of the brain. However, scientists solved the problem with an artificial neural network capable of encoding chip designs as vector representations and making it easier to explore the space of the problem. According to the document, “our intuition was that in a policy capable of the general task of chip placement, it should also be possible to encode the state associated with a new invisible chip into a meaningful signal at the time of the chip placement. inference. Therefore, we form a neural network architecture capable of predicting the reward in the placement of new lists of networks with the ultimate goal of using this architecture as the coding layer of our policy.
The term intuition is often used loosely, but it is actually a very complex and poorly understood process that involves experience, unconscious knowledge, pattern recognition, and many other factors. Our insights come from years of working in one field, but they can also be gained from experiences in other fields. Fortunately, testing this information becomes easier with the help of powerful computer and machine learning tools.
It should also be noted that reinforcement learning systems need a well-designed reward. In fact, some scientists believe that with the right reward function, reinforcement learning is enough to catch up with general artificial intelligence, and yet, without the right reward, an RL agent can get stuck in endless loops. .