Losing Trust in Our Voice because of the AI

Watched this video about how “Nobody knows what music is anymore” by Wings of Pegasus. There you can hear and even see a graphical analysis of a human singing voice, that is automatically tuned and corrected by a computer program.

The truth is, that I haven’t put much thought to this part of the technological development in music. And also it is the truth, that as a musician, a musicologist, and a music critic, I have been appreciating listening to singers, whose musical tuning imperfections were as minimal as possible. So that my eyes wouldn’t get watery when listening to some out-of-tune “singing”. But it looks like we have gotten too far in our striving for perfection and may be entering a dangerous zone with it. 

This is about how the Auto-Tuning and the Pitch Correction softwares and their utilization are changing our perception of what singing is supposed to sound like. And more importantly – how we are getting used to an artificially changed singing voice . 

Maybe as a preparation for the Artificial Intelligence (AI), which is doing more and more singing, entertainment, even … talking?

The same situation is with the narrators and voice-over artists, who are losing their livelihoods to the AI. The technology is able to clone/replicate their voices in movies, audiobooks and video games, sometimes even without their consent. The text-to-speech technology which is currently used, is just a step to this direction. The machine learning technology is now capable to truly replicate each particular voice. And those, who can save a lot of money using computers instead of humans are hardly concern, that these machines have “no soul”.

This way we will soon not be able to recognize a real natural human voice. Even worse, we will simply accept the AI and accept music and singing produced by it. And that in probably a very near future.

This trend is, obviously, worrisome. Not only because this way we are being trained to expect and accept only a “pure perfection” in the case of perfect tuning to 440 Hz. But also, we will not be able to recognize and appreciate a real human expression anymore. 

Music is about emotions, so when these are being copied and emulated by a technology, what impact this will have on our own ability to express our emotions and to recognize those in others? 

First we have been forced to cover our faces with masks, what prevented others to recognize our facial expressions. That is already showing negative impact especially on smaller children, who need to see faces of their parents and caregivers, to learn their social skills. 

Now we are able to correct the pitch of songs we listen – and thus to mostly “correct” that emotional part of what we can hear. 

Soon we will be told by an AI generated voice, whatever someone will want us to know/do, and we will now be able to recognize the unnatural origin of that sonic information.

And even worse, the AI is already capable of copying someone else’s voice and mimicking their speech as if it was delivered by a real person. So, for example, if a president of a country will ask its citizens to do something (to mobilize, to start a war, prevent spreading of a disease …), how much trust will people have in that announcement? 

Maybe you think I am pushing this too far? That this sounds like a pure theory bordering with something conspiratorial? But the way in which humans communicate has been rapidly changing in the last few decades. What will happen when we lose trust in our own voices? When our own laughter, our expression of love, or our call for help, will not only be “moderated” for correctness and meaning, but also for expression and emotion? 

Scientists still haven’t figured out, how our speech has developed. There are theories, that the ability to sing predates the human ability to speak. That at the beginning there were mothers singing to their newborn children, hundreds of thousands years ago. Which also gave birth to music and subsequently to the human speech. 

But how will children in the near future learn to recognize their mothers and their voice? How will they acquire social skills, if that soothing lullaby or a nursery rhymes song will be perfected and perfectly delivered by the Artificial Intelligence? 

I believe this is not about music anymore. It is about us being and remaining human. Humans are not perfect and are not supposed to be such. Everyone is different in our imperfections. That is what makes each one of us different and unique. I just hope, that in spite of the rapid technological development, we retain and further develop our ability to be humans. Humans with all our imperfections, peculiarities, differences, and emotions … even in our singing!

Comments are closed.