AI or artificial intelligence is increasingly making a dent in today’s society with different solutions and disrupting any type of industry. Now Microsoft has created a voice model called VALL-E which is able to imitate any voice with a sound of 3 seconds.
[La inteligencia artificial revoluciona todos los 谩mbitos de la creaci贸n gr谩fica: 驴para sumar o restar?]
AI in the human voice with VALL-E
this artificial intelligence be able to imitate anyone’s voice
If already in art it makes it impossible to know if a work was made by the hand of an artist (even getting someone who make similar illustrations to those generated by AI is blocked on networks such as reddit), the future that awaits us is completely uncertain.
From github the operation of this neural voice model which has been called VALL-E and which uses discrete codes derived from a neural audio codec model.
They used 60,000 hours of English voice data for training this voice model, which is almost hundreds of times larger than current legacy systems.
VALL-E uses these context learning capabilities and thus uses the synthesized custom voice in high quality with only the 3 second recording of a person’s voice.
And it is that this voice model not only remains to imitate the voice, but also maintains the person’s emotion when speaking and even the acoustic environment that surrounds it; that is, it’s almost a copy-paste of someone’s voice.
Different examples can be reproduced on github how VALL-E works, and the truth is that it is so surprising that it exceeds the ability of this voice model to imitate any person’s timbre.
You may be interested
Follow the topics that interest you