The Prospect of SVS (Singing Voice Conversion) Algorithms
- Yair Hashachar
- Dec 27, 2023
- 2 min read
Updated: Feb 5, 2024
SVS (Singing Voice Conversion) has been making headlines lately as the technology behind the abundance of deep musical fakes we encounter. This innovative technology allows for the replication of a singer's voice in a new song, sparking an essential ethical and legal debate. While discussions naturally revolve around monetization and royalties, let's take a moment to explore the potential of these technologies to steer our listening experience.
As someone for whom music occupies a central place in life I've always indulged in imaginative "what if" games. Most of these musings revolve around scenarios like, "What if singer X immigrated to country X and started singing in that country's language?" or "How would song X by singer Y sound if sung as a duet with singer Z?"
Until recently, bringing these imaginative scenarios to life was limited to expert music producers using advanced production techniques, and even then, the possibilities were somehow limited. However, just yesterday, I trained a model of one of my all-time favorite singers, Joni Mitchell, and began experimenting with letting her 'sing' some of my cherished Hebrew songs from my upbringing. The result was astonishingly powerful. It was an uncanny feeling, hearing Joni Mitchell breathe new life into these songs in Hebrew, casting them in an entirely different light.
Now, sharing this with the world isn't something I feel comfortable with, partly due to ethical considerations, but mostly because I cherished it as a personal experience. However, this experience got me thinking about the future of music consumption. Currently, as music fans, we're primarily limited to listening to our favorite artists' songs in the same way as every other fan. The songs remain closed, confined entities that flow unchanged from the mixing and mastering engineers' desks to the streaming platforms and ultimately reach our ears. But let's imagine a future audio streaming service where the possibilities for personalized listening experiences are endless.
Many already speculate that source separation (also known as de-mixing) will dominate the future of interactive listening experiences. This technology would empower listeners to create new mixes of their favorite songs. But let's take it a step further and imagine a world where you can place any singer on any song, forging sophisticated collaborations across generations and geographies. This won't be limited to renowned singers alone. Imagine the ability to incorporate the voices of your loved ones or even bring your late relatives back to life, serenading your own children with their renditions of beloved children's songs. The possibilities are truly limitless.
The future holds incredible potential for a deeply enriched and personalized listening experience that, when approached ethically, can amplify our emotional connection to music.