Talkie.ai Blog

[Webinar July 15th] Is your scheduling too complex for AI? Attend our free webinar to find out: The State of Complex Conversational AI Scheduling for Medical Practices

Blog

AI in healthcare

Choosing AI voices for your voicebot

February 16, 2022

AI voices are an important factor in making customers feel satisfied with your automated voice services. In this article, we take an in-depth look at choosing AI voices for your voicebot your customers will be happy to interact with.

How your voicebot speaks and sounds is one of the first things your customers will notice when they call. Most customers will expect the voicebot to have a “human-like” voice. A robotic-sounding voice will have an impact on call success. If the voice sounds like a robocall, there is a higher chance that customers will hang up.

There are a variety of ways in which voicebot speech can be configured and each configuration has slightly different outcomes in terms of the final customer experience. Below, we compare some common voicebot solutions and show how they handle both general and personalized conversations.

Voice synthesis for voice AI

Voice synthesis uses existing voice samples to generate a synthetic voice for an AI virtual assistant. Using the samples, the voice synthesizer is able to extract pronunciation to enunciate a wider range of words than are covered by the original recordings. This technology makes it easy to create a virtual assistant speaking voice that can handle a wide range of variables in conversation.

Personalizing customer conversations and voice synthesis

If you want your virtual agents to greet customers with a personalized message, e.g. to use the customer’s name stored in your CRM records, you need to work with variables that can generate dynamic content. The variable in this example would be the customer’s name.

For a voicebot to accurately say someone’s name, it needs either:

A large library of voice recordings of names recorded by the voice actor for the voicebot
A synthesized voice that can approximate the pronunciation of the variable content, based on existing recordings and some smart machine learning

Pre-recorded samples sound more natural but the cost to produce them is relatively high. In some cases, there will be so much variable, dynamic content that it would be difficult to record all of it affordably up front.

Synthesized voices offer a lot more vocabulary coverage at a much lower cost but sometimes the voice synthesis can sound unnatural to human listeners. It’s an area of technology that is developing quickly, however, and the quality of synthetic voices is improving rapidly.

If personalized conversations are a planned part of your customer experience with your virtual agents, you will need to consider the most appropriate strategy for voice synthesis for your business.

Industry examples of synthetic voices for AI

Here are three examples of synthetic voices for AI-powered voicebots. The first is from Amazon (AWS), the second is from Google, and the last example is from a third-party vendor, Resemble.ai.

Amazon (AWS)

Google

Resemble.ai

The quality of synthetic voices has improved greatly in the last few years. For now, there are some prosodic features which subtly mark the voices as non-human. One is the voice’s ability to produce smooth connected speech (also called “chunking”) – i.e. its ability to simulate the way human speakers join up and blend certain sounds within and across words based on the rhythm of a sentence. The other is producing natural sounding patterns of intonation – i.e. the syllables on which the tone of the voice rises or falls to emphasize key information or express certainty or uncertainty, etc.

In each example, however, the quality of the voices is clearly high enough to be acceptable to callers. The Talkie.ai platform allows our clients to choose whichever synthesized voice they feel will work best for them and their customers and manage it from within our platform.

Strategies of creating voices for voice AI

Choosing the most effective way to add voice to your voice AI will depend on the needs of your business. There are three main strategies for providing voice for your AI agents:

1. Record human voice actors

Recording human voice actors creates very natural sounding voices for the AI but this approach also has less flexibility. Each variation in the voice AI conversation will require its own recording, adding time and cost to setting up the voicebot.

2. Use industry-standard voice synthesis solutions

Industry standards, such as voice synthesis from AWS, Google Text-To-Speech, or other dedicated 3rd party solutions offer affordable and flexible ways to support synthetic voices. Both male and female voices are available in multiple languages and accents with most major services.

3. Use a combination of the two

Voice synthesis can be used in combination with voice recordings. Recordings are used by the voicebot to convey static or fixed parts of the conversation and the synthetic voice is used to parse variable/dynamic parts.

Advantages

Voice actor recordings

Naturalistic voices for the voice AI

This option stands the best chance for the scenarios when the conversion rate is at the center – the caller is less likely to hang up
Industry-standard voice synthesis solutions

No need for recordings

Male and female voices in multiple languages can be deployed easily
Combination

A best-of-both-worlds approach that uses the same voice for both static and variable elements in the conversation

Disadvantages

Voice actor recordings

Naturalistic voices for the voice AI

This option stands the best chance for the scenarios when the conversion rate is at the center – the caller is less likely to hang up
Industry-standard voice synthesis solutions

No need for recordings

Male and female voices in multiple languages can be deployed easily
Combination

A best-of-both-worlds approach that uses the same voice for both static and variable elements in the conversation

Ideal for

Voice actor recordings

Voicebot conversations where there is no variable/dynamic content or need for personalization

When there is a budget for ongoing professional voice recordings
Industry-standard voice synthesis solutions

Low-cost voicebot implementations where customers have a high tolerance for lower quality speech from the voicebot
Combination

Voicebot conversations where there is some variable/dynamic content but customers also expect human-like high quality speech

Examples of strategic AI voice implementations for voicebots

Sales example

A company is looking to use voicebots to automate an outbound sales campaign targeting existing customers in their CRM. Because the campaign’s success is partly determined by the ability to engage customers, the company decides to use a blend of recordings and voice synthesis. The synthetic voice will greet customers by their names using the customer records from the CRM to create personalized conversations between the voicebot and the customers.

Read a case study for Sales

Banking and Finance example

A banking service wants to offer an automated voicebot service that allows customers to check account status and balances. Since this service is highly transactional in nature and requires a large amount of variable/dynamic content to be vocalized by the voicebot, the business decides to use an entirely synthetic voice.

Read a case study from the Finance sector

Retail example

A large retail store chain wants to use voicebots to improve after-hour customer service for their business. The voicebot will take customer calls after hours and provide information about the business and its services. Since there is little need for dynamic content in the conversations, the business hires a professional voice actor to record the voicebot’s dialogue for a highly natural sounding voice that will greet customers and represent their brand appropriately.

Read a case study for Retail

The best-of-both-worlds approach to AI voices

For many of our clients, Talkie.ai recommends a combined approach of voice recordings with elements of voice synthesis. This best-of-both-worlds approach offers quality voicebot speech with the affordable flexibility of voice synthesis to personalize voicebot conversations.

Using our voice synthesis solutions, clients can create a unique voice for their virtual agents based on their top human agents or selected professional voice actors. Read more about how you can shape your virtual agents into your brand ambassadors in this article.

Find out more about getting started with voice AI

Request a demo

Read other articles:

Automated Customer Service: Switch from Short-term Cost Reduction to Long-term Return On Investment

Automation has a bad reputation, especially in the context of customer interactions. Many say that a company can’t provide a memorable customer experience without “the human...

7 Must-Ask Questions Before Choosing an AI Agent for Your Practice

Looking to add an AI assistant to your medical practice? This buyer’s guide outlines key questions to ask vendors before making a...

Choosing AI voices for your voicebot

Voice synthesis for voice AI

Personalizing customer conversations and voice synthesis

Industry examples of synthetic voices for AI

Strategies of creating voices for voice AI

1. Record human voice actors

2. Use industry-standard voice synthesis solutions

3. Use a combination of the two

Advantages

Voice actor recordings

Industry-standard voice synthesis solutions

Combination

Disadvantages

Voice actor recordings

Industry-standard voice synthesis solutions

Combination

Ideal for

Voice actor recordings

Industry-standard voice synthesis solutions

Combination

Examples of strategic AI voice implementations for voicebots

Sales example

Banking and Finance example

Retail example

The best-of-both-worlds approach to AI voices

Read other articles:

Automated Customer Service: Switch from Short-term Cost Reduction to Long-term Return On Investment

7 Must-Ask Questions Before Choosing an AI Agent for Your Practice