How to create voice over dialogue
Create a Speech Block by pressing the big Plus icon.
Type in what you want to say.
Pick a character from the Character Menu.
You can Activate the character menu by clicking on the Character icon.
You can then click the Play icon to the left of the speech block to render the audio, it will play automatically after it finishes processing.
After rendering a speech block, you can then play that back at any time by either clicking the play icons next to individual speech blocks,
you can play back the entire dialogue from top to bottom by clicking the Big Play icon on the bottom left of your Studios screen.
How to toggle between the character and style menu
Click the character bubble to activate the character selection menu.
To open the style selection menu, click inside the speech block you are working on.
How to apply emotions and styles to your dialogue
To apply style to an entire block of text, click once inside the speech block
to activate the style menu.
Select any style, and it will get applied to the entire speech block.
Soon, you will be able to apply style to only a portion of your speech block,
you can select the text that you want to style, to activate the style menu for that bit of text.
Select any style, and it will get applied to only that portion of the speech block.
How to export your audio files when done.
Click the EXPORT option on the top right side of the menu bar.
It will take you to the export page. Select all speech blocks that you want to export. Select all languages that you wish to export to.
Click Download, to begin processing and then downloading your finished files.
NOTE - It can sometimes take several minutes to download all your audio files.
After clicking Download, if you close the window or lose your internet connection, your files will be waiting for you to download from the Project Menu.
Why does it sometimes sound great and sometimes sound noisy?
The short answer is, our Voice AI model is constantly training
in the background and the quality of voices will improve over several weeks.
The long answer is, when you apply 'styles' to text, the AI has a much harder time predicting how that speech should sound, and so it gets it wrong more often.
In order to create 1 second of speech, at 22050 Hz, our AI would need to predict 22050 consecutive values correctly.
In the future, we will offer higher quality bitrate and sample rates.
How long does it take to render speech?
On average, you would wait between 10 - 60 seconds
for each speech block.
It varies between these times depending on how many other people are using the product at the same time, how many speech blocks you are trying to render in one go, and other such factors.
We will launch a paid feature where the speed will be so fast, that all your audio will be rendered in real-time.
Each time you modify the text, character, or style for any speech block, this speech block will need to be re-rendered.
Why does the emotion and style sometimes not work as expected?
There's a short answer and long answer.
For the long version, you need to understand a little about how the Emotion and Style actually works. The question directly below contains this information.
For the short answer, suffice it to say that as our AI imrpoves with time, you will experience this issue less often.
How does emotion and style work?
This is a slightly technical question, but for the curious, read on. We train our models in an 'Unsupervised' fashion. This means, during training, we don't tell the AI whether a given sentence is Happy or Sad, Angry etc. The AI stores this 'emotion' information in a high dimensional space, for use in 'inference'. As such, the AI learns to cluster emotional attributes from audio passages on its own, without any explicit conditions set by us.
When you are using Studios, you are using the AI in 'inference' mode. When you pick a 'style' from the style menu, what we are actually doing in the background is picking a pre-labeled 'happy', 'sad', or some such audio track from a big emotional dataset which we have built up over time, and we ask the AI to estimate emotion from that labeled audio track, and apply the emotion to the speech it is rendering for you.
Sometimes, the AI cannot find a good match for a particular
emotion or style from our emotion dataset,
and in such cases you might experience a situation where the rendered
speech doesn't sound right.
As a solution to this issue, we are working to add more emotion data, as well as provide a better UX for you to add a particular emotion to a sentence.
We will soon release a feature where you can record yourself performing a dialogue, and our AI will extract the emotion content from your own voice and apply it to the voice of your character.
This way, the character would perform a dialog by estimating what emotion or style you were trying to convey, but will perform it in their own way.
When will Replica offer more voices?
We only offer a few select voices for the Free Beta version of Studios. We will add more voices to the Free Beta over time, until we launch Replica Studios V1.
When we launch Studios V1, you can upgrade to the paid plan to have access to 100s of voices.
As part of Studios V1, we will also offer a feature where you can upload audio data for any existing character voices that you may have and own the rights to use, and we can offer you Replica voices characters.
These custom Replica voices will be secure and private, accessible only by you.
Contact us to find out more.
Having trouble rendering an output for your dialogue?
There are some known issues we are working to fix.
If your dialogue has commas, fullstops, exclamation marks, and other kind of punctuation appearing within the first few words of a sentence, it can confuse our AI.
While we are working on fixes, we recommend removing early punctuation from your dialogue, and trying to re-render if your dialogue isn't working.
Having trouble pronouncing certain words?
Our AI learns phonetically. This means, among other things,
our AI has trouble understanding
or two words that have the same spelling but are pronounced differently.
For example, consider the sentence : 'The dove dove down.'
As humans, we read that sentence and understand how to pronounce the two 'dove's. The first dove represents the bird, and the second dove represents the past tense of the word 'dive'.
Our AI, however, may only learn one pronunciation for that word.
The remedy in this case, is to spell your words phonetically. Try rendering : 'The dove dhove down.'
As with all other improvements, we are working to improve our pronunciation model, and are confident we'll be able to solve this in the coming months.
Until then, we recommend spelling words phonetically until you get the desired pronunciation.
When will the 'Coming Soon' be available?
Our approach with new features is to understand
features that are in most demand, and prioritize our release roadmap this way.
We look forward to your feedback with regards to which features you would like to use.
Many of the upcoming features will only be offered in the Paid package.
Can we vote on features?
Yes you can!
The best way to vote on a particular feature, is by submitting feedback from the feedback form, which you can see on the right hand side of any Studios screen.