Voice — or designing interactions for voice experiences — is the next frontier of UX or user experience design. What makes things a little tricky is that it’s essentially “invisible” and includes no visual elements, at least in the conventional sense. There are no animations or accents to help portray action. There are no colors or shapes involved, heck pictures and video don’t have a role on voice platforms at all.
In the past, it’s been possible to shirk or avoid the discussion of UX for voice because the technology was just coming into its own. Today, it’s considered mainstream with the likes of Amazon Echo, Google Home and mobile variants a la Siri or even Microsoft’s Cortana. There’s no getting around it, especially since voice is being implemented in everything from smart speakers and TVs, to connected refrigerators.
It’s here, it’s poised to make a significant impact across many industries and in UX, it’s already changing the way we approach modern design and interactivity.
What Is a Voice User Interface (VUI)?
GUI or Graphical User Interface is the term used to denote the visual interfaces of old, if you will. A VUI or Voice User Interface involves vocal communication that is often reciprocal between the user and command platform. Sometimes, it only goes one way, meaning the user will speak to the platform and it has to carry out said commands or actions.
The core element is voice, which itself relies on natural language processing and contextual interactions. In other words, you can talk to a VUI — just as you would another person — and it will understand you and react to that engagement.
Building a VUI is no different than building a GUI, at least in regards to the general process. What sets great voice systems apart from poor ones is the ability to discern and understand context. A system designed to ingest speech inaccurately will result in an incredibly frustrating and cumbersome experience.
How Voice Applies to UX Design
It’s important to understand that voice and related interactions are not necessarily anything new in the world of UX. Why?
Levers and pedals, keyboards and mouses, GUIs, touchscreens, and gestures — they’ve all served as an intermediary for the human connection to technology. With each iteration, the modified common controls allowed for faster, more efficient interactions. Touch, for instance, when used appropriately is incredibly precise, and when combined with gestures you can do some pretty amazing things, faster than ever before, even on a mobile device.
Voice is merely the next step for new interaction and controls. Rather than tap or swipe through three or more menus yourself, you can command an AI or automated system to do it for you. It makes perfect sense, especially in today’s landscape which is all about convenience and speed. People want to get things done effectively, and as fast as possible.
Designing Successful Voice Experiences
The core principles of UX design haven’t changed at all. It’s just the methods you use to get there. With visual design, you can use animations and movement to articulate actions. Tap on a button or space, for example, and the display shakes or reacts. That’s not the case with voice because there are no visuals used, at all. Even with something such as a voice-enabled TV or speaker, the interactions are invisible.
To design properly, you must first understand context and accuracy. It goes beyond simple commands because the context of voice, tones and even words can end up being different. Someone saying a phrase such as “delete” could mean many things. What if they simply mean to remove or hide a particular element as opposed to outright removing it? If you were to translate the command as the latter it could lead to potential issues later, especially in regards to data handling.
After context comes high-level design and logical call-flows. In traditional design call-flows move from segment to segment within a general experience. One step might include the welcome, next the login or account access and then it leads to the main menu with an umbrella of sub-options from there. Speech is more contextual, however, which means in UX you need to adopt the use of call-flows to account for this — hence the “high-level” approach.
Finally, you must know it will take lots and lots of prototyping and user research to get things right. This means calling in potential users, setting up a remote user testing study or running an Alpha/Beta period to gather important user insights. Did things work as planned? Were users having trouble calling upon a certain option or function? Did they understand when an action was completed? These are all things to consider, flesh out and check in real, live environments.
Understanding Your Audience Is Key
The key is understanding your audience just like when developing a visual interface or user experience. If you know your users, what they want and what they’re trying to do, you can better create a platform that helps them achieve this, efficiently. This also puts the context and dialogues you use into perspective. A general voice assistant such as Siri or Alexa is always going to have different tones, interactions and contextual commands than say an automated bank teller.
Regular user testing is one way to do this. Another is using an ongoing feedback loop to create or design. By integrating with various analytics tools and monitoring user activities, you can track user behavior to better understand what’s working and what’s not. This allows you to adapt, improve and optimize the system to achieve higher levels of success.
Nathan Sykes is the founder of Finding an Outlet. To read his latest articles, follow him on Twitter @nathansykestech.