Anil Nair
- Aug 1, 2021
- 5 min read

Designing for Voice User Interface

Starting with a question, whether these voice commands & voice interaction is something new? No, it isn’t! We have had automated voice calling & automated voice responses prevailing before this whole hype of Voice Interactions begins.

It has been quite some years that voice commands and voice interaction have been trending, and big companies investing heavily into this technology. With the emergence of Artificial Intelligence and Machine Learning, technology has found its way to become chatbots and personal assistance. What has changed with this is the user expectation and user perception about Voice interactions. The modern voice interaction has become more user-centric taking away the restriction of commands, making it more personalized and human. These interfaces revolve around the skills acquired by the human.

What does the user expect from Voice Interfaces?

Speech has always been fundamental means of communication for humans since the days of the stone age. Because of this when given a choice of voice interface, tend to expect the behavior of a normal person even though they are aware that they are speaking to a device. So before we design the interfaces for the voice we need to understand the psychology of humans and the principles that lead to speech communication. There can be a situation where the same message is conveyed in different ways or in different accents.

If the message conveyed and the understanding of the device don’t match, things may go wrong. This makes me recall a section from Jeff Patton’s “User Story Mapping” where he mentions the examples from “Cake Wreck”

Imagine the same scenario with the devices, if the machine doesn’t understand what the user actually meant.

So before you start designing the system, understand how you would like to have a conversation to be. Sit with someone and make notes of all possible ways a conversation can take place for a given use case.

This is what I experienced while trying to interact with the assistance. I was expecting google to know my choice of music as I always listen to music on my phone and all my devices are synced.

As a user, I felt frustrated when this happened to me.

A designer has to consider all the possible use cases, as each user may interact in n number of different ways.

For example, let's see two sets of users booking an Uber ride.

Case one:

User: Open Uber

Asst: Sure, where would you like to go?

User: Church Street.

Asst: Any specific location on Church Street?

User: Yeah! Near Church Street Socials

Asst: Ok! What kind of ride would you like to take?

User: An uber pool may be

Asst: Booking you an Uber Pool to Church Street Socials

Asst: Your ride has been booked and is 6mins away

User: Thanks!

Asst: Is there something else I can help you with?

User: No thanks!

Case two:

User: Book me an Uber Pool to Church Street socials

Asst: Ok, booking your ride to Church Street Socials

Asst: Asst: Your ride has been booked and is 6mins away

User: Thanks! Exit assistance.

These are just two instances, but there can be cases where the credit available in the payment wallet can be below the required amount or the ride might not be available at that point in time.

Guiding the design for VUI

VUI as mentioned above is all about making a natural conversation between the machine and Humans/Businesses. Though large corporations are investing their resources heavily into the research & development of voice integration, voice interactions are in their pre-matured stage.

You need a persona for a VUI

Building a voice interface is about building a natural human conversation and for this, you should treat the machine as a human. Identify the characteristics of a human and how human communication takes place. Humanizing the voice and the characteristics of interaction helps to connect with the users. What they expect is to connect with another human when interacting with the machine through speech.

Designing for accent & context

Always the wording used during the conversations may have different meanings for different contexts given. The system needs to be educated for these contexts and the dialog exchange should be driven by contexts.

Coming to the accents, there are different accents of English itself in India. But now imagine the scale, if you are building it for the world. The application should be able to pick these accents and also learn from the instances. It's more of a learning process and it also helps to personalize the application for an individual. This does take some time, but with the available tool kits it's not impossible.

Leveraging the user skills or being inclusive to design better

Every designer says don’t start with the assumption that your user will be knowing the process. But when it comes to something like NUI especially with voice interfaces, there are certain skill sets which your user is more comfortable with and the patterns that he/she uses those skill sets. Speech is also something similar.

From childhood, a person speaks in his own way of communicating. If you are making the user adapt to the accent or the way the interface will understand won’t be an easy task, hence increasing the user effort to a greater level. With VUIs, the effort has to be put into making the conversation personal and customized to each person.

User navigation

Voice interfaces are no different and users may find themselves lost during their journey, providing them a way out becomes a difficult task with the VUIs. Help them get assistance with proper suggestions on the actions to be performed and this will also prevent the chances of users making errors.

Limit yourself with information

Another challenge with VUIs is giving the right information that the user is looking for and being precise. Limiting the information will also help in bringing down the cognitive load of the users and making the conversations more human.

Limiting yourself with information overloading also helps the user to take his/her next step easily without confusion.

VUI doesn’t need to always listen to you

Fear of being heard by someone else always bothers the user. This needs to be taken care of at the design level itself. Most of the consent by early adopters is data protection. Most of the assistance is programmed to trigger conservation when it's asked for, but they constantly keep an ear for that call. So this while actually speaking as the application keeps its ear for the call, it should also not process or hear whatever the conversation may be.

Conclusion

The whole time while writing this I was remembering the movie “Her”. VUI has been slowly changing the things around us and it's redefining the way we interact with machines. The interaction which started with simple IVR is now emerging as one of the trending things with the advancement in AI and ML. What I believe is that no matter how strong the technology may grow but the basic principles will always remain the same to connect with the human mind.

Designing for Voice User Interface

What does the user expect from Voice Interfaces?

Guiding the design for VUI

Conclusion

Recent Posts