Report this

What is the reason for this report?

Preset SpeechSynthesisUtterance - lang and voice - and pass that TTS to audio stream

Posted on June 23, 2024
Mak

By Mak

Hello,

Workflow:

Text Form -> submit -> preset e.g. lang Italian and voice Microsoft Elsa - Italian (Italy) for SpeechSynthesisUtterance -> that synthesis make as MP3 source for audio stream.

Is it possible?



This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

These answers are provided by our Community. If you find them useful, show some love by clicking the heart. If you run into issues leave a comment, or add your own answer to help others.

Hi there,

Yes, this workflow should be possible.

What you could do is:

  1. Create a text form in HTML
  2. Set up event listener for form submission
  3. After that create a SpeechSynthesisUtterance with preset language and voice
  4. Then convert the speech synthesis to an audio stream
  5. And then create an audio element to play the stream

Feel free to give current setup and I will be happy to help out.

Also if you are hitting any issues, feel free to share the errors here as well.

- Bobby

Yes, it’s possible to preset the language and voice for SpeechSynthesisUtterance in JavaScript and convert that to an audio stream. However, the Web Speech API, which provides text-to-speech functionality, does not natively support exporting audio to a file (like MP3). You’d typically need a server-side solution for that, but I can show you how to set up the client-side TTS and play it back in the browser.

Here’s an example of how you can use SpeechSynthesisUtterance to set the language and voice, and then play the audio in the browser:

<!DOCTYPE html>
<html>
<head>
    <title>TTS Example</title>
</head>
<body>
    <form id="tts-form">
        <textarea id="text-input" rows="4" cols="50">Enter text to convert to speech</textarea><br>
        <button type="submit">Convert to Speech</button>
    </form>
    <script>
        // Wait for the voices to be loaded
        function populateVoiceList() {
            if (typeof speechSynthesis === 'undefined') {
                return;
            }

            const voices = speechSynthesis.getVoices();
            const voiceSelect = document.createElement('select');
            voiceSelect.id = 'voice-select';

            voices.forEach((voice) => {
                const option = document.createElement('option');
                option.value = voice.name;
                option.innerHTML = `${voice.name} (${voice.lang})`;
                voiceSelect.appendChild(option);
            });

            document.body.appendChild(voiceSelect);
        }

        populateVoiceList();
        if (typeof speechSynthesis !== 'undefined' && speechSynthesis.onvoiceschanged !== undefined) {
            speechSynthesis.onvoiceschanged = populateVoiceList;
        }

        document.getElementById('tts-form').addEventListener('submit', function(event) {
            event.preventDefault();

            const text = document.getElementById('text-input').value;
            const voiceSelect = document.getElementById('voice-select');
            const selectedVoice = voiceSelect.value;

            const utterance = new SpeechSynthesisUtterance(text);
            utterance.lang = 'it-IT'; // Italian language code

            const voices = speechSynthesis.getVoices();
            for (let i = 0; i < voices.length; i++) {
                if (voices[i].name === selectedVoice) {
                    utterance.voice = voices[i];
                    break;
                }
            }

            speechSynthesis.speak(utterance);
        });
    </script>
</body>
</html>

Explanation:

  1. HTML Form: The form takes text input and submits it.
  2. JavaScript:
    • populateVoiceList(): This function loads available voices and populates a dropdown list.
    • Event Listener: When the form is submitted, it creates a SpeechSynthesisUtterance object with the specified text, language (it-IT for Italian), and selected voice.
    • Speech Synthesis: The speechSynthesis.speak() method is called to play the speech.

Note:

  • This example focuses on client-side playback. Converting speech to MP3 and streaming it would require server-side processing with additional tools like FFmpeg.
  • You can replace 'it-IT' with any other language code supported by the Web Speech API.
  • The SpeechSynthesisUtterance does not directly support saving the output as an MP3 file; you’d need to use a server-side text-to-speech API for that.

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.