How Microsoft will remove background noise and barking from AI video calls

One of the most annoying things during video calls – and practically any type of call – is background sounds like other voices, mouse keyboard or animal sounds, like barking. Microsoft is aware of this and wants to get rid of these sounds by using Artificial Intelligence that will intervene in real time, separating these annoying sounds.

The project was announced a few days ago by a North American company, but now we know more about how it will work.

What Microsoft wants to avoid is exactly those awkward moments that occur in many video conferences when someone is asked to mute their microphone because, perhaps, they open a food pack or because their dog barks. However, you must properly distinguish non-stationary sounds from those of stationaryAs this has been removed from the company's current audio compression system.

In the meantime, what is being done is to use the edge of the invaders to find out what the sound of the voice of the publisher is and what the background noise is, such as the sound of the computer or similar sounds. So, this new launch of Microsoft in its video calling services will be focused on it sounds very difficult to identify and classify: sounds that don't stop, which may also occur simultaneously during that call.

He loves you | Greetings Office 365, hello Microsoft 365

The bark, someone opens the food pack, the glass falls and falls, or you knock on the door It can be non-stop sounds that are very difficult to recognize as sound. However, according to a Microsoft spokesperson, the sound produced by metals cannot be eliminated, one laughs, screams, or sings; Sounds from other people speak simultaneously, so these sounds cannot be distinguished.

How Microsoft is training its AI to distinguish background noise

"We trained the model to understand the difference between sound and speech, and then the model tried to keep the conversation going," explained Robert Aichner, team manager for Microsoft Teams at VentureBeat. This is made with a large number of videos of people talking in the background, where, thanks to the text, Artificial Intelligence is able to follow the conversation and, in this way, be able to see what's between a word and what a sound is.

«We host thousands of different speakers and over 100 different audio types. And then what we do is to mix pure speech without sound and noise. Then we copy the microphone signal. And then he gives an example as pure speech as basic truth. ”As easy as it may seem, Microsoft is actually facing a number of problems. The first is to obtain sufficient independent data. How can you intelligently produce those background sounds?

At first Both audiobooks and YouTube data sets have been used that are markedBut these genres are very different from real video calls, especially audio books. For this reason, it was decided to create videos directly to include them in the program, so that Artificial Intelligence could be trained on real situations.

He loves you | Microsoft introduced "Teams", its Slack network for companies

The problem is Users' video calls cannot be recorded for this purpose, for obvious user privacy issues. But even to do it, for example, with your employees' video calls, someone will have to record background sounds.

Follow Andro4all