Antares
Spell Reviewer
- Joined
- Dec 13, 2009
- Messages
- 982
Content
Introduction
AI voice generation is a powerful tool that has the potential to revolutionize the way we create our maps. Are you working on a custom campaign focused around Illidan or Arthas? You can now add their voices to your cinematics, have the player lean back, enjoy the scene, and not have to read text after text. You can, of course, also use AI voice generation to add voices to new characters you create. This can greatly increase the immersion and provide an entirely new level of polish to your maps.
However, this new technology also raises many issues, especially regarding copyright. The legal issues surrounding the use of AI voice generation will likely be hotly debated and figured out in the coming years. While the use of AI to generate custom voice lines for existing Warcraft 3 characters is a harmless application of this technology, it gets more problematic when we're dealing with voices from other franchises or those of real people, where the cloning of their voices could raise serious ethical concerns, and is something I do not encourage.
Finally, this technology is rapidly evolving. Many of these tips may no longer be up-to-date at the time you're reading this.
With that out of the way, let's explore the possibilities of using AI voice generation in custom maps!
Available AI Tools
Available AI voice generation tools include:
elevenlabs.io - Upload sound files to train the AI or choose a voice from a library, then have it read a text you enter or change your voice to a target voice.
uberduck.ai - Works similarly to elevenlabs.
voice.ai - Works by changing your voice instead of writing a prompt. You need at least 5 minutes of training data to start building voices, which will be difficult with only Warcraft 3 voice lines, but users can share generated voices.
Courtesy of @The Nightmare Book:
PlayHT - Free.
FakeYou - Has lots of game character voices.
replicastudios.com - Free.
readloud.net - Provides more-or-less robotic voicelines.
GitHub - neonbjb/tortoise-tts: - A multi-voice text-to-speech system trained with an emphasis on quality. Requires a good GPU.
Elevenlabs is widely regarded as the most advanced AI voice generation tool on the market right now. All voice samples in this tutorial are created with it. You have a limited amount of text that you can generate per month - more if you pay. If you're looking for just a few voice lines or are fine with a not-so-perfect delivery, the character budget of the free account should suffice. However, if you're trying to get a really naturally sounding delivery, you may have to retry the generation many times, which can quickly deplete your allowance.
Update: Voice cloning on elevenlabs is now behind a paywall, but the first month subscription only costs 1$. You can still use the Voice Design to generate voices for your new characters for free.
As an example, here are a custom "What" and a "Pissed" sound I created for my character "Space Jaina" using elevenlabs:
I don't have any experience with the other tools, so I cannot give any tips on how to achieve the best results using them. However, the tips included in this tutorial, other than those in the elevenlabs specific section, should apply to all voice generation tools equally.
Preparing Training Files
Skip this section if you don't want to clone existing Warcraft 3 characters. If you want to create new voices for new characters, you might simply try out the presets offered within the AI tools.
You can find a repository with training files for many different voices here:
The Great AI Voice Training Samples Repository
The AI needs a good amount of high quality training data to produce satisfying results. To this end, we want to compile all the voice lines of our character from both the unit command responses and the campaign dialogues. The campaign dialogues are located under Sound\Dialogue and can be directly extracted using the Sound Editor. A faster way, however, is to use Casc View and extract the audio files from the Warcraft 3 archives. To do this, click Open Storage, then the Warcraft 3 parent directory. Once you have opened the archive, navigate to war3.w3mod\_locales\enus.w3mod\sound\dialogue\.
If your character has more voice lines in the campaign than the tool you're using allows (25 for elevenlabs), you should either merge samples or use only the longest and best quality samples. Speculation: Merging audio files makes for worse results, because the AI gets confused if it's not an uninterrupted speech.
Listen through your training set again to make sure there are no other sounds appearing, for example the growling of Tyrande's tiger, which would confuse the AI. You might also want to remove voice lines in which the character talks in a very odd way that only makes sense in that specific situation.
You can try to clone voices not appearing in the campaign from just the unit response sounds, but you should not expect any mind-blowing results. Additionally, the cloning of human-like voices will obviously work much better than that of very distorted voices like demons or undead. You should not even attempt to make custom Lich voice lines. It just won't work.
Here are some samples to give you an idea of the quality you can expect from different voices:
Arthas (good quality):
Shandris (low amount of training data):
Anub'Arak (very distorted):
World of Warcraft Voice Lines
You can also train the AI with sound files from World of Warcraft. Note, however, that many of the voice actors have changed from Warcraft 3 to WoW. The easiest way to find the sound files is to go on wowhead.com, then go to Database -> Sound. Here you can search for sound files for your character.
Notably, the voice actress for Tyrande is Elisa Gabrielli in both Warcraft 3 and WoW. Because there is an ambient sound in all of Tyrande's voice files in Warcraft 3, you will get a much better result by training the AI with WoW sound files instead.
Getting the Best Output (elevenlabs)
This section is specific to the elevenlabs tool. It is based solely on my personal experience and involves a lot of speculation.
In general, the AI is much better at generating convincing speech for longer texts than for shorter texts. However, if you're entering texts that are too long, a single "hiccup" can spoil the entire output, so a good middle ground should be aimed for when you're trying to synthesize campaign dialogues.
As an example for this tutorial, I generated a speech for Jaina that could appear in a campaign cutscene. This was my input:
"I have deployed all the troops from Theramore I can spare for our assault on the gates of Ahn'Qiraj. I am confident that the tauren will join our expedition. However, I'm not so sure about the orcs."
The delivery is a bit sleepy, but still quite good, considering this was my very first attempt. The added sigh at the end is also a nice touch. Now let's try to generate the last sentence by itself:
In this example, the AI lacks the context and cannot infer the right speed and sentence melody that is appropriate for this sentence. The delivery is far too fast, which is often the case for short snippets. You will not get a satisfying result, even if you try countless times. However, we have a bit of control over the delivery by using punctuation. Here is the same line, but this time with the prompt: "However... I'm not so sure about the orcs."
The same line spoken by Tyrande:
Ways to change the delivery:
You can also put the sentence into a longer text to give the AI hints about the context of the text it's supposed to voice, then cut the surrounding text from the voice file using an audio editing software (but the surrounding text should be delimited by pauses). You can also add words like "However" or expressions like "Oh my god" to the beginning of sentences.
Here is a recreation of "O07Jaina46" from the Orc campaign using the prompt:
"Wait!... this is INSANE!... You can't possibly expect me to ally with THEM?!?" (cutting the audio after "to", I should have put a pause there)
There is now also an elevenlabs discord server, where you can find additional tips and tricks about getting the best output.
Speech-to-Speech (elevenlabs)
Elevenlabs now also offers a speech-to-speech feature, where, instead of using a text prompt, you provide an audio file and the AI changes the voice into the target voice, while keeping the rhythm and melody exactly the same. The audio file can be one you record yourself or an already existing audio file.
This feature allows you to exert much more control over the dialogue's delivery. However, using your own recordings might not produce good results for everyone, because there are several caveats. The AI will pick up your accent as well as any audio problems with the recording, such as background noise, pops and clicks, muffled speech and so on. Unless you have a good recording setup and are able to speak accent-free, the result of the voice transfer will probably sound off.
There are, however, different ways to make use of the speech-to-speech feature, by using already existing audio files or by feeding the audio files created with elevenlabs back into the AI. The utility of the first approach is easy to see. Maybe you want to make a morphed demon hunter voice for a different character or you're working on an alternate reality campaign in which Jaina is a dreadlord?
(Tichondrius voice with bass enhanced and slight reverb added in Audacity)
Feeding an AI generated audio file back into the AI is a useful method when you want to make a character speak in a certain way that's not accessible with the default text-to-speech method. For example, it will be very hard to make Tyrande speak in a casual way, because there is always a lot of pathos in her deliveries. However, by generating the voice lines for a different character first, you can then use those audio files to change the delivery for the Tyrande voice.
Here is Uther speaking with the melody of GLaDOS:
Make sure to slide the "Exaggerate Style" option to full when using this method.
Editing and Importing
Now it's time to import the voice files into your map. If you're working on custom unit response sounds, you can replace any of the existing soundsets. Find one that has the right number of What, Yes, YesAttack, and Pissed sounds. For example, I wanted a sixth Pissed sound for Space Jaina, so I replaced the Knight sounds instead of those from the original Jaina.
To get the correct import paths, go into the Sound Editor (F5) and search for the name of the unit you want to replace. The import path for Knight for example is:
"Units\Human\Knight\SoundFileName". The "Sounds" and "Internal" folders are not part of the import path. Then, go to the Asset Manager (F12), import the sound files, then change the subfolder from "war3mapImported" to the correct path.
Listen to the voice lines next to each other and make sure the audio levels are consistent. You might have to increase the volume on most of them. You might also find that a voice line that sounded great when you listened to it by itself sounds unnatural when put into the ensemble. You might have to then recreate it.
As convincing as these AI generated voices can be, if you put them next to those spoken by the original voice actor, you will notice a difference. Therefore, if you're working on custom unit responses, I recommend recreating them completely, and not using a mix of old and new sounds.
Final Thoughts
I hope this tutorial has been of interest and I hope you have success with bringing our beloved Warcraft characters to life in your maps. Thank you to @Wareditor for helping me get started with the AI tool.
Introduction
AI voice generation is a powerful tool that has the potential to revolutionize the way we create our maps. Are you working on a custom campaign focused around Illidan or Arthas? You can now add their voices to your cinematics, have the player lean back, enjoy the scene, and not have to read text after text. You can, of course, also use AI voice generation to add voices to new characters you create. This can greatly increase the immersion and provide an entirely new level of polish to your maps.
However, this new technology also raises many issues, especially regarding copyright. The legal issues surrounding the use of AI voice generation will likely be hotly debated and figured out in the coming years. While the use of AI to generate custom voice lines for existing Warcraft 3 characters is a harmless application of this technology, it gets more problematic when we're dealing with voices from other franchises or those of real people, where the cloning of their voices could raise serious ethical concerns, and is something I do not encourage.
Finally, this technology is rapidly evolving. Many of these tips may no longer be up-to-date at the time you're reading this.
With that out of the way, let's explore the possibilities of using AI voice generation in custom maps!
Available AI Tools
Available AI voice generation tools include:
elevenlabs.io - Upload sound files to train the AI or choose a voice from a library, then have it read a text you enter or change your voice to a target voice.
uberduck.ai - Works similarly to elevenlabs.
voice.ai - Works by changing your voice instead of writing a prompt. You need at least 5 minutes of training data to start building voices, which will be difficult with only Warcraft 3 voice lines, but users can share generated voices.
Courtesy of @The Nightmare Book:
PlayHT - Free.
FakeYou - Has lots of game character voices.
replicastudios.com - Free.
readloud.net - Provides more-or-less robotic voicelines.
GitHub - neonbjb/tortoise-tts: - A multi-voice text-to-speech system trained with an emphasis on quality. Requires a good GPU.
Elevenlabs is widely regarded as the most advanced AI voice generation tool on the market right now. All voice samples in this tutorial are created with it. You have a limited amount of text that you can generate per month - more if you pay. If you're looking for just a few voice lines or are fine with a not-so-perfect delivery, the character budget of the free account should suffice. However, if you're trying to get a really naturally sounding delivery, you may have to retry the generation many times, which can quickly deplete your allowance.
Update: Voice cloning on elevenlabs is now behind a paywall, but the first month subscription only costs 1$. You can still use the Voice Design to generate voices for your new characters for free.
As an example, here are a custom "What" and a "Pissed" sound I created for my character "Space Jaina" using elevenlabs:
I don't have any experience with the other tools, so I cannot give any tips on how to achieve the best results using them. However, the tips included in this tutorial, other than those in the elevenlabs specific section, should apply to all voice generation tools equally.
Preparing Training Files
Skip this section if you don't want to clone existing Warcraft 3 characters. If you want to create new voices for new characters, you might simply try out the presets offered within the AI tools.
You can find a repository with training files for many different voices here:
The Great AI Voice Training Samples Repository
The AI needs a good amount of high quality training data to produce satisfying results. To this end, we want to compile all the voice lines of our character from both the unit command responses and the campaign dialogues. The campaign dialogues are located under Sound\Dialogue and can be directly extracted using the Sound Editor. A faster way, however, is to use Casc View and extract the audio files from the Warcraft 3 archives. To do this, click Open Storage, then the Warcraft 3 parent directory. Once you have opened the archive, navigate to war3.w3mod\_locales\enus.w3mod\sound\dialogue\.
If your character has more voice lines in the campaign than the tool you're using allows (25 for elevenlabs), you should either merge samples or use only the longest and best quality samples. Speculation: Merging audio files makes for worse results, because the AI gets confused if it's not an uninterrupted speech.
Listen through your training set again to make sure there are no other sounds appearing, for example the growling of Tyrande's tiger, which would confuse the AI. You might also want to remove voice lines in which the character talks in a very odd way that only makes sense in that specific situation.
You can try to clone voices not appearing in the campaign from just the unit response sounds, but you should not expect any mind-blowing results. Additionally, the cloning of human-like voices will obviously work much better than that of very distorted voices like demons or undead. You should not even attempt to make custom Lich voice lines. It just won't work.
Here are some samples to give you an idea of the quality you can expect from different voices:
Arthas (good quality):
Shandris (low amount of training data):
Anub'Arak (very distorted):
World of Warcraft Voice Lines
You can also train the AI with sound files from World of Warcraft. Note, however, that many of the voice actors have changed from Warcraft 3 to WoW. The easiest way to find the sound files is to go on wowhead.com, then go to Database -> Sound. Here you can search for sound files for your character.
Notably, the voice actress for Tyrande is Elisa Gabrielli in both Warcraft 3 and WoW. Because there is an ambient sound in all of Tyrande's voice files in Warcraft 3, you will get a much better result by training the AI with WoW sound files instead.
Starcraft 2 Voice Lines
Courtesy of @Daratrix
To extract sound files from SC2 (applies to all files)
Courtesy of @Daratrix
To extract sound files from SC2 (applies to all files)
- Open CASC View (Game storage: Starcraft II)
- Click the Tools tab => Search File(s)
- In the "File Mask" field, you put your filter/search term. You need to use wildcards (*) around the keywords
- Click search and wait for it to finish.
- Despair and try to fix your search terms for 10 minutes.
- Highlight all the files that you want to extract in the results (you can Ctrl+A to select everything, Shift+Click to select a range, or Ctrl+Click to select specific files).
- Right click > Extract (F5).
- I recommend ticking "Extract plain names, ignore storage directory structure" in the extraction options.
Heroes of the Storm Voice Lines
You can find all voice lines for heroes in Heroes of the Storm on the Heroes of the Storm Wiki. Simply go to the article of the hero of your choice and find the voice lines under Quotes. Download the audio files by right-clicking on the play-icon and then "Save Audio As..."
You can find all voice lines for heroes in Heroes of the Storm on the Heroes of the Storm Wiki. Simply go to the article of the hero of your choice and find the voice lines under Quotes. Download the audio files by right-clicking on the play-icon and then "Save Audio As..."
Getting the Best Output (elevenlabs)
This section is specific to the elevenlabs tool. It is based solely on my personal experience and involves a lot of speculation.
In general, the AI is much better at generating convincing speech for longer texts than for shorter texts. However, if you're entering texts that are too long, a single "hiccup" can spoil the entire output, so a good middle ground should be aimed for when you're trying to synthesize campaign dialogues.
As an example for this tutorial, I generated a speech for Jaina that could appear in a campaign cutscene. This was my input:
"I have deployed all the troops from Theramore I can spare for our assault on the gates of Ahn'Qiraj. I am confident that the tauren will join our expedition. However, I'm not so sure about the orcs."
The delivery is a bit sleepy, but still quite good, considering this was my very first attempt. The added sigh at the end is also a nice touch. Now let's try to generate the last sentence by itself:
In this example, the AI lacks the context and cannot infer the right speed and sentence melody that is appropriate for this sentence. The delivery is far too fast, which is often the case for short snippets. You will not get a satisfying result, even if you try countless times. However, we have a bit of control over the delivery by using punctuation. Here is the same line, but this time with the prompt: "However... I'm not so sure about the orcs."
The same line spoken by Tyrande:
Ways to change the delivery:
- "...", "⸺", or line breaks to add a pause.
- Exclamation marks make the delivery more forceful.
- "?!?" or similar combos can be used for good effect, but are not always reliable.
- Putting words in ALL CAPS seems to nudge the AI into putting the emphasis on a specific word, but it is not reliable.
You can also put the sentence into a longer text to give the AI hints about the context of the text it's supposed to voice, then cut the surrounding text from the voice file using an audio editing software (but the surrounding text should be delimited by pauses). You can also add words like "However" or expressions like "Oh my god" to the beginning of sentences.
Here is a recreation of "O07Jaina46" from the Orc campaign using the prompt:
"Wait!... this is INSANE!... You can't possibly expect me to ally with THEM?!?" (cutting the audio after "to", I should have put a pause there)
There is now also an elevenlabs discord server, where you can find additional tips and tricks about getting the best output.
Speech-to-Speech (elevenlabs)
Elevenlabs now also offers a speech-to-speech feature, where, instead of using a text prompt, you provide an audio file and the AI changes the voice into the target voice, while keeping the rhythm and melody exactly the same. The audio file can be one you record yourself or an already existing audio file.
This feature allows you to exert much more control over the dialogue's delivery. However, using your own recordings might not produce good results for everyone, because there are several caveats. The AI will pick up your accent as well as any audio problems with the recording, such as background noise, pops and clicks, muffled speech and so on. Unless you have a good recording setup and are able to speak accent-free, the result of the voice transfer will probably sound off.
There are, however, different ways to make use of the speech-to-speech feature, by using already existing audio files or by feeding the audio files created with elevenlabs back into the AI. The utility of the first approach is easy to see. Maybe you want to make a morphed demon hunter voice for a different character or you're working on an alternate reality campaign in which Jaina is a dreadlord?
(Tichondrius voice with bass enhanced and slight reverb added in Audacity)
Feeding an AI generated audio file back into the AI is a useful method when you want to make a character speak in a certain way that's not accessible with the default text-to-speech method. For example, it will be very hard to make Tyrande speak in a casual way, because there is always a lot of pathos in her deliveries. However, by generating the voice lines for a different character first, you can then use those audio files to change the delivery for the Tyrande voice.
Here is Uther speaking with the melody of GLaDOS:
Make sure to slide the "Exaggerate Style" option to full when using this method.
Editing and Importing
Now it's time to import the voice files into your map. If you're working on custom unit response sounds, you can replace any of the existing soundsets. Find one that has the right number of What, Yes, YesAttack, and Pissed sounds. For example, I wanted a sixth Pissed sound for Space Jaina, so I replaced the Knight sounds instead of those from the original Jaina.
To get the correct import paths, go into the Sound Editor (F5) and search for the name of the unit you want to replace. The import path for Knight for example is:
"Units\Human\Knight\SoundFileName". The "Sounds" and "Internal" folders are not part of the import path. Then, go to the Asset Manager (F12), import the sound files, then change the subfolder from "war3mapImported" to the correct path.
Listen to the voice lines next to each other and make sure the audio levels are consistent. You might have to increase the volume on most of them. You might also find that a voice line that sounded great when you listened to it by itself sounds unnatural when put into the ensemble. You might have to then recreate it.
As convincing as these AI generated voices can be, if you put them next to those spoken by the original voice actor, you will notice a difference. Therefore, if you're working on custom unit responses, I recommend recreating them completely, and not using a mix of old and new sounds.
Final Thoughts
I hope this tutorial has been of interest and I hope you have success with bringing our beloved Warcraft characters to life in your maps. Thank you to @Wareditor for helping me get started with the AI tool.
Last edited: