LIVE WEBINAR! Contact center staffing in the midst of great resignation: How voice AI can help

Register here
  1. Blog
  2. /
  3. Virtual agents
  4. /
  5. Choosing AI voices for your voicebot

AI voices are an important factor in making customers feel satisfied with your automated voice services. In this article, we take an in-depth look at choosing AI voices for your voicebot your customers will be happy to interact with.

How your voicebot speaks and sounds is one of the first things your customers will notice when they call. Most customers will expect the voicebot to have a “human-like” voice. A robotic-sounding voice will have an impact on call success. If the voice sounds like a robocall, there is a higher chance that customers will hang up.

There are a variety of ways in which voicebot speech can be configured and each configuration has slightly different outcomes in terms of the final customer experience. Below, we compare some common voicebot solutions and show how they handle both general and personalized conversations.

Voice synthesis for voice AI

Voice synthesis uses existing voice samples to generate a synthetic voice for an AI virtual assistant. Using the samples, the voice synthesizer is able to extract pronunciation to enunciate a wider range of words than are covered by the original recordings. This technology makes it easy to create a virtual assistant speaking voice that can handle a wide range of variables in conversation.

Personalizing customer conversations and voice synthesis

If you want your virtual agents to greet customers with a personalized message, e.g. to use the customer’s name stored in your CRM records, you need to work with variables that can generate dynamic content. The variable in this example would be the customer’s name.

For a voicebot to accurately say someone’s name, it needs either:

  • A large library of voice recordings of names recorded by the voice actor for the voicebot
  • A synthesized voice that can approximate the pronunciation of the variable content, based on existing recordings and some smart machine learning

Pre-recorded samples sound more natural but the cost to produce them is relatively high. In some cases, there will be so much variable, dynamic content that it would be difficult to record all of it affordably up front.

Synthesized voices offer a lot more vocabulary coverage at a much lower cost but sometimes the voice synthesis can sound unnatural to human listeners. It’s an area of technology that is developing quickly, however, and the quality of synthetic voices is improving rapidly.

If personalized conversations are a planned part of your customer experience with your virtual agents, you will need to consider the most appropriate strategy for voice synthesis for your business.

Industry examples of synthetic voices for AI

Here are three examples of synthetic voices for AI-powered voicebots. The first is from Amazon (AWS), the second is from Google, and the last example is from a third-party vendor, Resemble.ai.

Amazon (AWS)
Google
Resemble.ai

The quality of synthetic voices has improved greatly in the last few years. For now, there are some prosodic features which subtly mark the voices as non-human. One is the voice’s ability to produce smooth connected speech (also called “chunking”) – i.e. its ability to simulate the way human speakers join up and blend certain sounds within and across words based on the rhythm of a sentence. The other is producing natural sounding patterns of intonation – i.e. the syllables on which the tone of the voice rises or falls to emphasize key information or express certainty or uncertainty, etc.

In each example, however, the quality of the voices is clearly high enough to be acceptable to callers. The Talkie.ai platform allows our clients to choose whichever synthesized voice they feel will work best for them and their customers and manage it from within our platform.

Strategies of creating voices for voice AI

Choosing the most effective way to add voice to your voice AI will depend on the needs of your business. There are three main strategies for providing voice for your AI agents:

1. Record human voice actors

Recording human voice actors creates very natural sounding voices for the AI but this approach also has less flexibility. Each variation in the voice AI conversation will require its own recording, adding time and cost to setting up the voicebot.

2. Use industry-standard voice synthesis solutions

Industry standards, such as voice synthesis from AWS, Google Text-To-Speech, or other dedicated 3rd party solutions offer affordable and flexible ways to support synthetic voices. Both male and female voices are available in multiple languages and accents with most major services.

3. Use a combination of the two

Voice synthesis can be used in combination with voice recordings. Recordings are used by the voicebot to convey static or fixed parts of the conversation and the synthetic voice is used to parse variable/dynamic parts.

Advantages

  • Voice actor recordings

    Naturalistic voices for the voice AI

    This option stands the best chance for the scenarios when the conversion rate is at the center – the caller is less likely to hang up

  • Industry-standard voice synthesis solutions

    No need for recordings

    Male and female voices in multiple languages can be deployed easily

  • Combination

    A best-of-both-worlds approach that uses the same voice for both static and variable elements in the conversation

Disadvantages

  • Voice actor recordings

    Naturalistic voices for the voice AI

    This option stands the best chance for the scenarios when the conversion rate is at the center – the caller is less likely to hang up

  • Industry-standard voice synthesis solutions

    No need for recordings

    Male and female voices in multiple languages can be deployed easily

  • Combination

    A best-of-both-worlds approach that uses the same voice for both static and variable elements in the conversation

Ideal for

  • Voice actor recordings

    Voicebot conversations where there is no variable/dynamic content or need for personalization

    When there is a budget for ongoing professional voice recordings

  • Industry-standard voice synthesis solutions

    Low-cost voicebot implementations where customers have a high tolerance for lower quality speech from the voicebot

  • Combination

    Voicebot conversations where there is some variable/dynamic content but customers also expect human-like high quality speech

Examples of strategic AI voice implementations for voicebots

Sales example

A company is looking to use voicebots to automate an outbound sales campaign targeting existing customers in their CRM. Because the campaign’s success is partly determined by the ability to engage customers, the company decides to use a blend of recordings and voice synthesis. The synthetic voice will greet customers by their names using the customer records from the CRM to create personalized conversations between the voicebot and the customers.

Banking and Finance example

A banking service wants to offer an automated voicebot service that allows customers to check account status and balances. Since this service is highly transactional in nature and requires a large amount of variable/dynamic content to be vocalized by the voicebot, the business decides to use an entirely synthetic voice.

Retail example

A large retail store chain wants to use voicebots to improve after-hour customer service for their business. The voicebot will take customer calls after hours and provide information about the business and its services. Since there is little need for dynamic content in the conversations, the business hires a professional voice actor to record the voicebot’s dialogue for a highly natural sounding voice that will greet customers and represent their brand appropriately.

The best-of-both-worlds approach to AI voices

For many of our clients, Talkie.ai recommends a combined approach of voice recordings with elements of voice synthesis. This best-of-both-worlds approach offers quality voicebot speech with the affordable flexibility of voice synthesis to personalize voicebot conversations.

Using our voice synthesis solutions, clients can create a unique voice for their virtual agents based on their top human agents or selected professional voice actors. Read more about how you can shape your virtual agents into your brand ambassadors in this article.

Find out more about getting started with voice AI