Alexa Skill Development – Best Practices for Custom Skill User Interface Development

The sections below provide a list of common best practices used when designing a voice experience for a custom skill. While you should plan to monitor and tune your voice interactions based on the ways that users interact with your skill, the tips in this document will improve the usability of your design out of the gate, before you have the benefit of user data.

These best practices in these sections are particularly applicable for a custom skill that focuses on voice interactions. However, with Echo Show, the skill developer must consider a combination of voice, screen, and touch interactions. The principles described here still apply, as voice continues to be of primary significance for custom skills. The general principles described in this document apply to screen interactions as well.

Note that the recommendations in this document apply to the custom interaction model you create for a custom skill. If you are using a pre-built model, you do not need to worry about building the interaction model yourself.

Getting Information from the Use

– Make It Clear that the User Needs to Respond

Presenting the options alone does not sufficiently inform the user that they need to respond, so make sure you ask the user a question so they know that they are expected to say something.

User: Alexa, start Trivia Challenge.

Trivia Challenge: Here are your categories: 80’s Pop Songs, Potent Potables, or European History. Which one do you want?


– Don’t Assume Users Know What to Do

When users experience your capability for the first time, they may give only minimal information by simply asking Alexa to open your skill without providing any further detail. When this happens, you need to tell them what options they have for interacting with your experience.

User: Alexa, open Car Fu.
Car Fu: You can ask to get a ride or request a fare estimate. Which will it be?
User: Get a ride.

Car Fu: Sending your request. A mobile alert will let you know when your car arrives. Thanks for using Car Fu.


– Clearly Present the Options

When prompting the user with a set of options, construct the language of your prompt so that it is clear to the user that it is an either/or question. Otherwise, the Alexa TTS service may render the prompt in a way that sounds like a yes/no question. It is important to listen to every prompt on the test device to verify that the TTS has elicited the expected responses.
User: Alexa, ask Food Taxi to order a burger for me.

FoodTaxi: Which side would you like: french fries or a salad?

User: Salad


– Keep It Brief

Speech is linear and time-based and users cannot skim spoken content like they can visual content. Users need to make quick decisions in response to questions they are asked because time is ticking. Therefore, prompts should be short and concise, while still being clear.

User: Alexa, ask Astrology Daily to give me my horoscope
Astrology Daily: Horoscope for which sign?


– Avoid Overwhelming Users with Too Many Choices

When presenting users with a list of options to choose from, make sure choices are clearly stated, do not present more than three choices, and avoid repetitive wording.

User: Alexa, ask Dairy Shack to order me a milkshake.
Dairy Shack: Which flavor would you like: Chocolate, Vanilla, or Strawberry?


– Offer Help for Complex Skills

When your Alexa capability does a lot of things (more than three), you should not try to fit every option into a single prompt. Instead, present the most important options first, along with help. If the user asks for help, you should list out all your capabilities. Remember to ask the user a question after presenting the options.

User: Alexa, start Score Keeper
Score Keeper: Score Keeper. You can give a player points, ask for the score, or say help. What would you like?

User: Help

Score Keeper: Here are some things you can say: Add John, give John 5 points, tell me the score, start a new game, or reset all players. You can also say, stop, if you’re done. So, how can I help?


– Ask Only Necessary Questions

You should make smart assumptions when possible in order to avoid unnecessary questions. Asking non-essential questions adds friction to the user experience and makes your experience seem less thoughtful. Examples:

  • If your skill only does one thing, do not ask the user if they want to do that thing.
  • Make educated guesses when appropriate (without making it difficult to correct).
  • Tailor prompt examples based on the user (for example, user profile).

User: Alexa, start Joke Bank.
Joke Bank: What’s black, white, and red all over? An embarrassed skunk.


– Use Confirmation Selectively

Avoid dialogs that create too many confirmations, but confirm actions of high consequence, such as:

  • Actions that are publicly visible (for example, posting to social media)
  • Actions that affect another person (for example, sending a message)
  • Actions where money is involved (for example, when the user is buying something)

User: Alexa, ask Astrology Daily for my horoscope.

Astrology Daily: Horoscope for what sign?

User: Libra
Astrology Daily: Today’s outlook for Libra is …


– Obtain One Piece of Information at a Time

Users may not always give all of the information required in one step. If information cannot be assumed, ask the user for the missing information step-by-step.

User: Alexa, Ask Date Night to make a reservation at Haymarket tonight.

Date Night: Reservation at Haymarket. For what time tonight?

User: About 7:30.


– Use the Amazon Alexa App to Enhance Discovery

Take advantage of your card in the Alexa App to educate users on the capabilities of your experience. When you provide example phrases in your card, include “full intent” examples. Think of these as the equivalent of the way you’d verbally describe how to use your skill to someone who’d never used it before. Avoid using examples where the user does not give an intent.
“Alexa, ask Score Keeper to start a new game”

“Alexa, ask Score Keeper to add John to the game”

“Alexa, ask Score Keeper to give 5 points to John.”

“Alexa, ask Score Keeper for the score”

Presenting Information to the User

– Make Sure Users Know They are in the Right Place

In short interactions, it is not necessary to explicitly tell users that they are entering or exiting your experience, but it is still important to let them know where they are to help confirm that they are in the right place (this also helps in cases where Alexa mistakenly routed them to your skill).

In speech-only interactions, users do not have the benefit of visuals to orient themselves. Using landmarks tells users that Alexa heard them correctly, orients them in the interaction, and helps to instill trust in your experience.

User: Alexa, ask Astrology Daily for today’s Pisces horoscope.
Astrology Daily: Here’s today’s Pisces horoscope: You’ve got a friend who can help you overcome today’s problems.


Present Information in Consumable Pieces

Humans can only retain small pieces of information that they hear. As such, you should only present what is absolutely required in order to keep the interaction as short as possible. When presenting longer lists, group them into small chunks of three to five items and ask the user if they want to continue after you present each chunk. This also helps users feel like they are in control of the interaction.

User: Alexa, ask Savvy Consumer for best sellers in garden.
Savvy Consumer: The top seller in the garden department is Repel Lemon Eucalyptus Natural Insect Repellent, 4-Ounce Pump Spray

Would you like to hear the rest?

User: “Yes”

Savvy Consumer: Number 2: TERRO Ant Killer Liquid Ant Baits

Number 3: Weber 12-Inch 3-Sided Grill Brush

Number 4: Black & Decker 30-Feet Line String Trimmer Replacement Spool

Would you like to hear more?

User: “No”


– Write for the Ear, not the Eye

The prompts written for voice-forward experiences will be heard, not read, so it is important to write them for spoken conversation. Fragment sentences and ending sentences with a preposition are acceptable, if it sounds natural in spoken dialog. Also, remember that what looks good on paper may not sound good when spoken, so make sure to listen to your prompts on your test device to make sure the TTS sounds good.

User: Alexa ask NHL tracker to give me an update on the Sharks game.
NHL Tracker: The Sharks are trailing the Stars, 3 to 2, in the third period.


– Avoid Technical and Legal Jargon

Be honest with the user about what is happening, but don’t use technical jargon that the user won’t understand or that does not sound natural. Similarly, since legal messages often contain long, unnatural language meant for reading, not spoken dialog, they can disrupt the voice experience and can confuse users.

As much as possible, keep legal jargon out of an Alexa experience (though you should consult your own legal counsel about what may be required for your Alexa experience, and this is not legal advice). Legal disclaimers can be added to the Alexa app for users to read and take time to process, if needed.

User: Alexa, ask Flight Stats for the status of Alaska 328.
Flight Stats: Alaska flight 328, from Seattle to San Jose, is delayed due to mechanical repairs, and is now scheduled to depart at 6:25.


– Using Text-to-Speech Effectively

You provide text that the Alexa service converts to speech. The following best practices can improve the quality of the text-to-speech conversion.

– Use SSML Tags to Modify Prosody

Use SSML tags to modify the prosody or add emphasis to your text. For instance, you can use the <prosody>tag to change the volume, pitch, and rate of speech.

Do not use text formatting such as all caps or repeated letters to change these elements.


– Clarify Specialized Abbreviations and Symbols

Alexa’s text-to-speech conversion handles most text such as abbreviations and special characters automatically. For example:

  • “Dr. Smith” is spoken as “Doctor Smith”
  • “amazon.com” is spoken as “amazon dot com”
  • “Lake Shore Dr.” is spoken as “Lake Shore Drive”

However, this conversion is not automatic for some specialized or uncommon abbreviations. Test the text-to-speech conversion and either write out text or use an SSML tag such as <say-as> or <sub>.


– Use SSML to Specify the Right Variants When Necessary

Many English words have different pronunciations depending on meaning or part of speech. For example, “read” is pronounced differently in these two sentences:

  • I read the book yesterday
  • I plan to read the book next week

When using words with these types of variations, you may need to specify the variant to use to ensure that the text-to-speech conversion pronounces the word correctly.


– Write Text-to-Speech in the Target Language or Using Supported Phonemes

Use native words and sounds when writing your text-to-speech responses. The Alexa Skills Kit currently supports:

  • English (AU)
  • English (CA)
  • English (IN)
  • English (UK)
  • English (US)
  • French (CA)
  • French (FR)
  • German (DE)
  • Italian (IT)
  • Japanese (JP)
  • Portuguese (BR)
  • Spanish (ES)
  • Spanish (MX)
  • Spanish (US)

Use the <phoneme> SSML tag to customize the pronunciation using the supported phonemes or add phonemic/phonetic pronunciation for the text (for example, people may pronounce words like “pecan” differently).


– Test the Text-to-Speech Results and Revise as Needed

Test how Alexa converts the text you provide to speech and make sure it sounds the way you intend. In addition to testing with an Alexa-enabled device (such as an Amazon Echo), you can use the Test page to listen to your skill’s responses and experiment with SSML.

If you encounter a problem with the text-to-speech conversion, try the following:

  • Modify the text. Slight changes in the text can affect the synthesis, which may fix the problem. This is most useful for intonation issues.
  • Mark up the text using the supported SSML tags.
  • Report the problem on the developer forums. This may be most useful for pronunciation issues.

Handling Dialogue Errors


– Use Re-Prompting to Provide Guidance

If user says something that Alexa does not understand, or the user does not respond at all, the Alexa Skills Kit gives you the option of specifying a re-prompt (a prompt that is played after an error). Since only a single prompt is permitted for both low confidence and timeout dialogue errors, the prompt you write needs to give the user guidance on what kind of response you are expecting.

It is acceptable for re-prompts to be slightly more verbose than the first prompt users hear, but the content needs to clearly present what the user is expected to say.

User: Alexa, ask Tide Pooler for high tides.

Tide Pooler: High tides for what city?

User: Las Vegas

Tide Pooler: I can only provide tide information for coastal cities, like San Diego or Boston. Now, which city would you like?

User: Virginia Beach.


– Offer a Way Out if the User Gets Stuck

False accepts occur when Alexa incorrectly understood what the user said and continues with the wrong information. If Alexa presents your skill to the user by mistake, or the user is otherwise stuck in the interaction, you should offer a way out in the help prompt in case they are unaware that the “stop” command is available.

User: Alexa, start Store Finder

Alexa: [Misinterprets this request as “start Scorekeeper”]

Score Keeper: Score Keeper. You can give a player points, ask for the score, or say help. What would you like?

User: Help

Score Keeper: Here are some things you can say:

(lists several options)

You can also say, stop, if you’re done.

So, how can I help?

User: Stop.


– Don’t Blame the User

Errors will happen. Expect that they will and handle them well. Do not place blame on the user when errors happen, but do not be too apologetic either (doing so makes your experience seem tentative and further reduces user confidence in your skill).

User: Ask Hotel Hub to find a hotel

Hotel Hub: Hotel reservations for what city?

User: [background noise]

Hotel Hub: What city was that?


– Expect the Unexpected

Unlike a visual interface, where the user can only interact with the affordances presented on the screen, there is no way to limit what users can say in a speech interaction. Because of this, it is important to plan for reasonable things users might say that are not supported and then handle them intelligently.

User: Ask Date Night to make a reservation at Haymarket tomorrow.

Date Night: Reservation at Haymarket. For what time tomorrow?

User: At 10:30 pm

Date Night: Haymarket takes reservations between 5:00 and 9:30 pm. What time would you like your reservation?

User: 9:00

Date Night: And for how many people?

For more information, you can get in touch with us at +1 713 701 5957 or drop in an email at sales@pragmatic-voice.com

Leave a Reply

Your email address will not be published. Required fields are marked *