Tutorial

Introducing Dia, a TTS model from Nari Labs

Published on May 7, 2025
 Introducing Dia, a TTS model from Nari Labs

Introduction

One of the most exciting areas in AI right now is the advancement of voice models. In our ongoing exploration of cutting-edge text-to-speech (TTS) models, we previously highlighted the Conversational Speech Model from Sesame.

In this article, we will discuss Dia, a 1.6 billion parameter open-source TTS model from Nari Labs. Currently, there is not much information available on its architecture other than it is heavily inspired by SoundStorm, Parakeet, and Descript Audio Codec. We’ll leave it up to you to speculate how this model was trained and will perhaps cover it in a follow-up article once more information is available, but for now, we’ll focus on its implementation.

We’re very impressed with the model’s performance. Test it out yourself in this HuggingFace space or follow the implementation instructions below.

Implementation

We’ll cover two different ways of testing out this model. The first is in the Web Console. This is great for one-off testing scenarios for you to do a quick check of the model’s capabilities in a Gradio interface. The second is using the Python library, which is great for developing more intricate applications.

Option 1: Web Console CLI

Step 1 : Set up a GPU Droplet

Begin by setting up a DigitalOcean GPU Droplet, select AI/ML and choose the NVIDIA H100 option. ai-ml

Step 2: Web Console

Once your GPU Droplet finishes loading, you’ll be able to open up the Web Console. web console

In the web console, copy and paste the following code snippet:

git clone https://github.com/nari-labs/dia.git
cd dia
python -m venv .venv
source .venv/bin/activate
pip install -e .
python app.py

The output will be a Gradio link that you can access within VS Code.

Step 3: Open VS Code

In VS Code, click on “Connect to…” in the Start menu. connect to

Choose “Connect to Host…”. connect to host

Step 4: Connect to your GPU Droplet

Click “Add New SSH Host…” and enter the SSH command to connect to your droplet. This command is usually in the format ssh root@[your_droplet_ip_address]. Press Enter to confirm, and a new VSCode window will open, connected to your droplet.

You can find your droplet’s IP address on the GPU Droplet page. connection details

Step 5: Access the Gradio

In the new VSCode window connected to your droplet, type >sim and select “Simple Browser: Show”. simple browser

Paste the Gradio url from the Web Console, hit enter, and click the arrow in the top right. url

gradio This is the Gradio interface. Feel free to modify the input text to your liking.

Using Dia Effectively

To use Dia effectively, it’s essential to consider the length of your input text. Nari Labs recommends aiming for text that corresponds to 5-20 seconds of audio for the most natural-sounding results. If your input text is too short, equivalent to under 5 seconds of audio, the output may sound unnatural. On the other hand, inputs that would take over 20 seconds to speak will be compressed, resulting in unnaturally fast speech. By keeping your text within the moderate range, you can achieve more realistic and engaging audio outputs.

When creating dialogue with Dia, using speaker tags correctly is crucial. Always begin your input text with the [S1] tag to indicate the first speaker. When switching between speakers, alternate between [S1] and [S2] tags, making sure to never use [S1] twice in sequence. This simple tagging system helps Dia understand the conversation flow and produce a more natural-sounding dialogue.

In addition to speaker tags, non-verbal elements can also enhance your audio outputs. However, it’s recommended to use non-verbal tags sparingly for the most natural results. Stick to the officially supported non-verbal sounds listed in the documentation, as overusing these tags or attempting to use unlisted non-verbals may introduce unwanted artifacts.

Option 2: Python Library

To work with Dia in a more programmatic way, we can implement its Python library. The code snippet below from voice_clone.py can be modified to your liking.

from dia.model import Dia


model = Dia.from_pretrained("nari-labs/Dia-1.6B", compute_dtype="float16")

# You should put the transcript of the voice you want to clone
# We will use the audio created by running simple.py as an example.
# Note that you will be REQUIRED TO RUN simple.py for the script to work as-is.
clone_from_text = "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face."
clone_from_audio = "simple.mp3"
# For your custom needs, replace above with below and add your audio file to this directory:
# clone_from_text = "[S1] ... [S2] ... [S1] ... corresponding to your_audio_name.mp3"
# clone_from_audio = "your_audio_name.mp3"

# Text to generate
text_to_generate = "[S1] Hello, how are you? [S2] I'm good, thank you. [S1] What's your name? [S2] My name is Dia. [S1] Nice to meet you. [S2] Nice to meet you too."

# It will only return the audio from the text_to_generate
output = model.generate(
    clone_from_text + text_to_generate, audio_prompt=clone_from_audio, use_torch_compile=True, verbose=True
)

model.save_audio("voice_clone.mp3", output)

Conclusion

Kudos to Nari Labs for pushing the frontier of text-to-speech models - and what’s even more remarkable is that it’s driven by just two passionate undergraduate students. You can really just do things.

We’re excited to hear about how you’re leveraging TTS models. Share your experiences with DigitalOcean GPU Droplets in the comments below: how are you harnessing their power for your TTS applications?

Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.

Learn more about our products

About the author(s)

Melani Maheswaran
Melani Maheswaran
See author profile
Category:
Tutorial
Tags:

Still looking for an answer?

Ask a questionSearch for more help

Was this helpful?
 
Leave a comment


This textbox defaults to using Markdown to format your answer.

You can type !ref in this text area to quickly search our full set of tutorials, documentation & marketplace offerings and insert the link!

Join the Tech Talk
Success! Thank you! Please check your email for further details.

Please complete your information!

Become a contributor for community

Get paid to write technical tutorials and select a tech-focused charity to receive a matching donation.

DigitalOcean Documentation

Full documentation for every DigitalOcean product.

Resources for startups and SMBs

The Wave has everything you need to know about building a business, from raising funding to marketing your product.

Get our newsletter

Stay up to date by signing up for DigitalOcean’s Infrastructure as a Newsletter.

New accounts only. By submitting your email you agree to our Privacy Policy

The developer cloud

Scale up as you grow — whether you're running one virtual machine or ten thousand.

Get started for free

Sign up and get $200 in credit for your first 60 days with DigitalOcean.*

*This promotional offer applies to new accounts only.