Listen to a special audio message from Bill Roper to the Hive Workshop community (Bill is a former Vice President of Blizzard Entertainment, Producer, Designer, Musician, Voice Actor) 🔗Click here to hear his message!
We just used voiceline compilations extracted from other games, mostly WoW and HOTS, to be able to generate voice for a specific character. The results... varied. Sometimes 2 minutes of voicelines were enough to get a realistically sounding voice, but we mostly used compilations 10-20-minute-long, generally you'll get better results with samples that are longer and clear of any background noises like environment sounds or electronic beeps.
When generating particular voice, usually 3-5 attempts were enough to get good enough results, but there were few lines and some characters that indeed took us much longer. We mostly manipulated one parameter - stability - where lower stability made the results more variable, so they were more often bad, but after more attempts you could get some true pearls. For longer texts it's obviously better to use higher stability to get less surprises where the character says everything fine and screws up the last word.
In the end we were doing some postprocessing in Audacity - applying "normalize volume" filter (I believe our magic number was -0,17 or sth like that) for everything to have similar loudness, and sometimes also slightly changing speed.
This site uses cookies to help personalise content, tailor your experience and to keep you logged in if you register.
By continuing to use this site, you are consenting to our use of cookies.
We are using ElevenLabs - Generative AI Text to Speech & Voice Cloning. Pretty simple to use, for a small monthly fee. Paraphrasing Milton Friedman - there's no such thing as free (and good) services
We just used voiceline compilations extracted from other games, mostly WoW and HOTS, to be able to generate voice for a specific character. The results... varied. Sometimes 2 minutes of voicelines were enough to get a realistically sounding voice, but we mostly used compilations 10-20-minute-long, generally you'll get better results with samples that are longer and clear of any background noises like environment sounds or electronic beeps.
When generating particular voice, usually 3-5 attempts were enough to get good enough results, but there were few lines and some characters that indeed took us much longer. We mostly manipulated one parameter - stability - where lower stability made the results more variable, so they were more often bad, but after more attempts you could get some true pearls. For longer texts it's obviously better to use higher stability to get less surprises where the character says everything fine and screws up the last word.