It's time to personalise the voice
When a device talks, what should its voice sound like? Current voice assistants all have something in common; a one voice fits all approach. Research has shown that the voice can have a major impact on how users perceive the technology. My research suggests that personalising the voice is the way forward.
We treat computers as real people.
Voice sounds can convey various signals that humans can pick up naturally to identify, such as personality, age and gender of a talking human. The same applies when humans talk to computers, as the “Media Equation” points out that humans treat computers as real people and can have real social relationships.
Is realism always better?
Research has shown that making technology more realistic is not always the best option, as it could lead the user into the uncanny valley. Humanness maximised in one dimension, but not along other dimensions, is a big problem. Synthetic voices that use the word “I” receive more of a negative response than when a human voice says it .
Having a realistic and natural voice will improve the user’s experience, as long as the user knows he/she is talking to a computer.
The effects of humour, fear and emotion in Voice Technology
In voice interface design, humour has been underused because it can have major downsides. Female voices with a sense of humour can be perceived as sarcastic and aggressive. However, I have found that innocent humour has had a consistent positive effect on users.
For users to feel that voice user interfaces are realistic, emotion is crucial. Emotion in voice technology increases the inevitability of the interface, reduces risk taking and will be judged more positively.
Emotion is crucial for human communication.
Emotion is the most powerful type of state to predict how a user will behave. So powerful, a big part of the brain used in emotion determines whether an image is a human or not (circled in red).
Primary emotions are the body’s first response and are easy to identify. They include sadness, happiness, anger and fear.
Secondary emotions are emotional reactions users have to other emotions. They include pride, frustration, shame or anxiety. These emotions are much more complex than primary emotions and they need to be considered substantially more when designing for Voice User Interfaces.
A genderless voice can reduce gender binary.
Research has shown that the majority of users prefer female extroverted voices. However, male voices are held more credible by users when talking about information. Gender has become a concern for designers, as it can lead to social innuendo and stereotypes. Genderless voices could be the solution to this ongoing problem. Researchers at Project Q recently built a gender-neutral voice, built from more than twenty participants who identify themselves as non-binary or transgender.
Users are attracted to voices with a similar personality.
People who interact with others with similar personality have a more positive experience than with someone with a very different personality. Humans think that similar personalities in other humans are friendly, trustworthy and intelligent.
The friendliness can be determined by the voice’s pitch, speed and frequency range, while the dominance of a user can be determined from loudness, the deepness of the voice and a limited frequency range.
Solution: The User-Device-Context Model
The User-Device-Context Model can help designers when designing for the voice.
A person’s characteristics (e.g. their gender, personality traits, etc.) needs to be considered when designing a voice. Users can not only identify personality traits in a robot based on verbal and non-verbal behaviour, but are attracted to robots that have a personality complementary to their own.
The device itself can have a substantial influence on the user’s preference for the voice. Eye gaze, prosody, hand gestures and mouth movements are examples of important cues in communication.
Research suggests that users perceive devices with cheeks as feminine and childlike, while devices without a mouth were regarded as unfriendly.
Voice systems that sound human-like can mislead users to believe the device can achieve more than it can. Instead of aiming for a more naturalistic voice, designers should match the device with the tasks they are capable of doing.
Context includes linguistic, temporal and cultural factors. Preferences for features like gender and personality traits differ by language and by region. When companies fail to perform proper market research and tailor voice design to a cultural demographic, the resulting product can backfire in sales and customer satisfaction.
Language is integral to a context of use. The language that an individual speaks shapes the way they think. Many languages are grammatically gendered, meaning that the words for inanimate objects are masculine or feminine. For example, the word for “dishwasher” is masculine in Spanish, yet feminine in German.
This research was taken from my Masters Dissertation
If you would like to read the dissertation in full, do not hesitate to contact me!