The Next Generation of Digital Humans

Enterprise-ready V4 expressive avatars for humanlike realtime interactions and consistent, high-fidelity video at scale.

High-fidelity avatars for every use case

D-ID V4 Expressive Avatars bring an unprecedented level of realism and emotional range to enterprise communication.

They enhance both interactive conversations and scripted video content, ensuring messages feel natural, engaging, and aligned with your brand.

Two-way interactions

The avatar automatically delivers the voice inflection and facial expressions suitable for the moment, to match the desired tone.

Scripted videos

You get to choose from multiple sentiments to match context.

For both use cases, V4 avatars provide sharper lip sync and more accurate facial nuance for a natural, trustworthy delivery.

Trained on performances captured from professional actors and optimized for lower latency and stronger visual control, V4 adapts cleanly to your intended messaging, so every message stays on-brand, lifelike, and scalable.

What’s new in V4?

Selectable sentiments to match the moment.
More humanlike realism with richer facial nuance and expression.
Sharper lip sync for clearer, more believable delivery.
Lower latency for smoother real-time conversations in visual agents.
Improved listening and speaking states for more lifelike presence during real-time interactions.
Better visual control across framing and formats for consistent results across poses, dimensions, and channels.
Optional camera activation to give the agent more contextual awareness.
Media display mode enabling the agent to share videos, images or charts throughout the live conversatio

How it works

In real-time applications such as qualification calls, customer support, or training platforms, V4 Expressive Avatars are integrated directly into the visual agent experience.

A conversational AI or LLM orchestrates which sentiment to display at each turn based on context and intent. The avatar’s facial expression and vocal delivery update seamlessly in real time, maintaining sub 0.5 second latency throughout the interaction.

Enable the camera to give your more contextual awareness. Upload media assets like videos or images that your agent can display. Set up web hooks to enable your agent to take actions in digital space.

Tech Specs

Conversational latency (end-to-end): < 500 ms
Support of multiple sentiments with EQ-based sentiment control
Excellent lip-sync accuracy: 5.7 LSE-D
Accessible in D-ID Studio and via the D-ID API

Learn more

Elevate your content across all workflows

Learning and training

D-ID’s V4 expressive avatars make training clearer, more engaging, and easier to scale. They turn complex content into digestible explanations with nuanced delivery, expressive guidance, and multilingual narration. From onboarding to compliance, teams learn faster and retain more when information is presented by a relatable, humanlike instructor.

Marketing

Avatar-led videos and interactive agents help brands stand out with content that feels personal, dynamic, and memorable. Whether introducing a product, explaining a service, or creating personalized campaigns at scale, D-ID’s V4 expressive avatars deliver high-impact storytelling that’s always on-brand and instantly adaptable across channels.

Customer experience

D-ID V4 Expressive avatars and visual agents transform digital touchpoints by offering humanlike interaction at every step. They provide fast, consistent answers with natural delivery, reduce support load, and create a friendlier experience for users—day or night, in any language. The result: higher satisfaction, smoother journeys, and more effective self-service.

Avatar models

V2 Avatars

Created from a single image with lightweight rendering
Enables quick generation with broad language support
Most efficient option for high-volume, simple communication needs
Compatible with interactive visual agents

V3 Instant Avatars

Created from a short user-recorded or uploaded video
Preserves the original background and movement while delivering perfectly lip-synced narration using a cloned voice or synthetic voice, or a recording
Great for rapid, authentic video content at scale

V3 Pro Avatars

Created from a 3–5 minute uploaded video
Delivers highly realistic facial detail and natural motion
Includes a cloned voice for flexible narration
Geared for professional content, with optional green-screen recording enabling background control

New

V4 Avatars

Created from a series of short recordings capturing multiple emotional vocal and facial expressions
Produces emotionally aligned delivery with precise facial and vocal synchronization
Ideal for high-impact use cases where authentic human nuance is essential

How to use V4 avatars in D-ID Studio

1. Pick your avatar

Look for the sentiment icon that identifies Expressive V4 avatars and create a video or a visual agents

2. Choose a sentiment

Select from Friendly, Professional, Empathetic, Excited or Frustrated to match the tone of your message.

3. Select a voice

For the best expressive results, choose an ElevenLabs V3 voice if available

4. Enter your script

Type in what you want your avatar to say or upload an audio file to drive your expressive avatar video

5. Customize your video

Add backgrounds, text, shapes or media layers to enrich your video

6. Click "Generate Video"

Once you are happy with your video, click on the generate video button to bring your creation to life

How to use V4 avatars in D-ID Studio

1. Pick your avatar

Look for the sentiment icon that identifies Expressive V4 avatars and create a video or a visual agents

2. Choose a sentiment

Select from Friendly, Professional, Empathetic, Excited or Frustrated to match the tone of your message.

3. Select a voice

For the best expressive results, choose an ElevenLabs V3 voice if available

4. Enter your script

Type in what you want your avatar to say or upload an audio file to drive your expressive avatar video

5. Customize your video

Add backgrounds, text, shapes or media layers to enrich your video

6. Click "Generate Video"

Once you are happy with your video, click on the generate video button to bring your creation to life

How to use V4 avatars via D-ID API

Set the model to V4 in your API request.
Reference the avatar you want to render – Expressive V4.
Pass sentiment parameters to control expressive delivery
Provide your input text, audio, or streamed input and generate output.
Test and tune sentiment and voice settings before deploying to production.

Read our documentation

Detailed view of programming code in a dark theme on a computer screen.

V4 Expressive Avatars FAQs

V4 Expressive Avatars are D-ID’s most advanced digital humans, designed to deliver emotionally accurate, humanlike communication across both avatar videos and real-time visual agents.
V4 introduces richer facial expression, selectable sentiments, sharper lip sync, and lower latency—resulting in more natural delivery for both scripted and live interactions.
Expressive V4 avatars are marked with a sentiment icon in the avatar selection screen.
For best results, we recommend ElevenLabs V3 voices, which offer improved expressiveness and alignment with V4 facial animation. Cloned voices and uploaded audio are also supported.
No. In most cases, you simply select a V4 avatar and choose a sentiment. Existing scripts, audio inputs, and integrations continue to work as before.
Yes. API customers can upgrade by selecting the V4 model and optionally passing sentiment parameters. No major infrastructure changes are required.
V4 is ideal for high-impact use cases where realism, emotional nuance, and trust matter—such as customer experience, training, marketing, and executive communications.

High-fidelity avatars for every use case

What’s new in V4?

How it works

Tech Specs

Elevate your content across all workflows

Avatar models

V2 Avatars

V3 Instant Avatars

V3 Pro Avatars

V4 Avatars

1. Pick your avatar

2. Choose a sentiment

3. Select a voice

4. Enter your script

5. Customize your video

6. Click "Generate Video"

1. Pick your avatar

2. Choose a sentiment

3. Select a voice

4. Enter your script

5. Customize your video

6. Click "Generate Video"

What are V4 Expressive Avatars?

What makes V4 different from previous avatar versions?

How do I identify V4 Expressive avatars in the Studio?

Which voices work best with V4 Expressive avatars?

Do I need to change my workflow to use V4?

Is V4 available through the API?

Who is V4 best suited for?