At present, there is no way around the fact that hard drives fail. They are less reliable than SSDs, assuming these drives don’t push to the limit in tasks like Chia mining, but they also offer higher capacities at lower prices and for this reason they are so used in data centers. This is an important factor for companies like Google Cloud, which need to be able to handle huge amounts of data, therefore having an artificial intelligence system that helps them predict business failures. reaction in advance is a very important step.
Artificial intelligence to predict hard drive failures
“At Google Cloud, we know firsthand how critical it is to manage hard drives in operations and preemptively identify potential failures.”– The company said in a recent blog post. “We are responsible for the management of some of the largest data centers in the world; Any mistake in identifying these faults at the right time can cause serious disruption to our many products and services.
The problem is that manually identifying a faulty drive (which Google defined as “broken or suffered more than 3 failures in 30 days”) is a time consuming process and also requires a technician to have a. physical access to the device. Google Cloud and Seagate wanted to use the machine learning
Google Cloud said it has “millions of disks deployed in production systems generating terabytes (TB) of raw telemetry data”, including “billions of rows of data. CLEVER per hour and host metadata such as repair logs, online vendor diagnostic log (OVD) or field-accessible reliability metrics (FARM), as well as hard drive manufacturing data.
This means that the company has an incredible number of hard drives running on production machines that generate hundreds of parameters and factors that need to be tracked and monitored by engineers. However, this data may also be processed by a Artificial intelligence system
So far, the companies have tested two models: one based on AutoML tables and the other custom developed for this project. The first one so far has been more reliable with a 98% accurate and 35% recovery versus 70-80% accuracy and 20-25% second recovery, which also means that experience has served twice as much to demonstrate the benefits of using Auto Machine Learning instead of a custom solution.
Google Cloud has announced plans to expand the system to support all Seagate hard drives and expand it to all brands of hard drives they use in their data centers.