Introducing Agents: Unlock your 200 Free sessions! Sign up to get started

Best Practices for Building Datasets for D-ID Agents

Digital AI Presenters for Marketing

Creating a successful D-ID Agent for answering questions with human-like precision leverages several best practices to ensure the agent is effective, reliable, and capable of providing high-quality responses. Here’s a rundown of key items to consider when building your dataset and agent:

D-ID Agents

  • D-ID Agents are designed to respond to questions with high accuracy.  We can think of our Agent as facing an open book test where the questions come from our stakeholders and the agent can access a well-curated knowledge base tailored to the audience’s needs when responding.
  • The value of a D-ID Agent is closely linked to the quality of the data it uses, making the construction of a high-quality dataset a critical step in creating value.

Importance of Quality Data

  • The foundation of a successful D-ID Agent is a comprehensive, high-quality dataset that covers the topics our stakeholders are interested in discussing with our Agent.
  • A dataset’s value is determined not by sheer size alone but by the relevance, quality, and organization of the information it contains. 
  • A concise, well-thought-out dataset will outperform a larger, poorly organized data set.

Best Practices for Dataset Construction

Source Diverse and Credible Data:

  • Ensure your data sources are credible and referenceable and they cover the scope of topics your Agent needs to address.
  • Remove any conflicting and duplicate information to maintain the accuracy of responses.

Prioritize Data Quality:

  • Focus on clean, clear, well-formatted, and concise data.
  • Perform checks for spelling and grammatical accuracy to avoid confusing your Agent.
  • Aim for a balanced distribution of knowledge, taking ethical considerations into account and avoiding bias.

Consider Adopting an FAQ style Dataset for your Agent:

  • Leverage high-quality data from your text files, PDFs, and presentations.
  • Start with a structured dataset of “Question and Answer” pairs, ideally sourced from your best FAQ documents, as these represent a solid foundational data source to build upon.
  • The FAQ format leads to a well-structured dataset ideally suited to Agent use cases.

Organize Data Efficiently:

  • Categorize your FAQ data by subject to facilitate optimal retrieval.
  • This structured approach enables your Agent to provide direct and relevant answers.
  • Resist the temptation to add vague lower quality data to increase the dataset size.

Focus on Specific Topics Initially:

  • Cover specific areas thoroughly before expanding the scope of the dataset.
  • Use feedback from your stakeholders to identify areas for improvement or expansion.

Continuous Improvement and Expansion:

  • Regularly update and refine your dataset, removing older data that may conflict with new best practices to avoid confusing the model.
  • Plan to continuously update and expand your Agent’s knowledge base, drive more and more value, and provide greater utility for your users.


Building a D-ID Agent is a dynamic process that requires attention to the quality and organization of the data it relies on. By following these best practices, you can create an Agent that serves as a dependable resource for your audience, offering precise and valuable answers. The journey from starting with a focused dataset to expanding your Agent’s expertise is a path marked by continuous learning and adaptation, leading to the development of a comprehensive digital assistant.

Evolve to NUI

Skip to content