‘Lived Experience’ workshop on Accessibility and Voice Experience Design by Tim Noonan

Intro

These notes augment my 2-hour ‘Lived Experience’ workshop around accessibility and voice experience design.

My UX viewpoints come from my three key perspectives/interest areas which are:

  • An accessibility consultant who lives a disability;
  • a voice experience designer who lives in a non-visual interface;
  • an authority on the human voice & vocal expression.

I’ll be making additions to these notes towards a conference presentation for later in the year.

Below are some ideas which we touched upon lightly or which we didn’t have time to explore in the session. Please let me know if you would like any clarifications or expansions.

Pre Notes ahead of Workshop

I’ll explain my work-flow in audio only.

There are ideas in this that may help in voice app design, and understanding accessibility issues better.

My ears and brain get fatigued, we need to find silence amongst the noise of the world.

Accessibility = usability. Accessible Inclusive Design – AID

Standards, Guidelines and Experimentation.

How do you decide what to put in a standard – it assumes we know what works best.

Distilling experience and understanding into guidelines is a key role in my past work.

Case Studies

Long Form Audio and Information

iVote by Phone: fully Automated Telephone Voting system in the NSW State Election

Today’s News Now: An automated service for browsing and reading (hearing) newspapers over the phone.

Both are examples of Inclusive Service Design

My Non-visible Workplace Configuration

I’ll bring my portable office and run through how I manage various tasks

  • I’ll adorn myself with the wearables and peripherals that I use during the day and how they interact with one-another.

  • I’ll speak about the Apple ecosystem and what about it works for me and why

  • I’ll discuss the impact of touch-screens on blind people and demonstrate my RIVO – 2.0 an alternative input & output device for iOS devices.

AirPods (two sets) and Apple Watch.

Also Victor Reader Stream – talking book player

Braille Sense Mini – Braille input output note-taking device.

Markdown and its uses

These notes and webpage contents were prepared and edited entirely in Markdown.

  • Human readable even in raw form.
  • Separates content and presentation.* Headings and elements
  • Editable on any device
  • WordPress web pages, RTF and basic HTML rendering. All of the scores of content pages on <timnoonan.com.au> are drafted and edited in Markdown.
  • Reviewing notes with structure using Byword for preparation and presentations.

Hearable’s and Augmented Reality

Bone Conduction headphones – Blue’s Titanium

Noise-cancelling and hear-through – Sennheiser Ambia Pro

Augmented Reality and staying connected with the outside world.

Near-field and far-field voice communication.

Occlusion shutting world out and making your voice boomy and distracting.

App Accessibility vs web and issues of cross-platform support

I’ll speak briefly about the current state-of-play for mobile apps iOS and Android, Web, ARIA and pitfalls of tools and pre-made Javascript libraries.

Google’s Flutter Deve kit mentions accessibility across iOS and Android platforms.

Will also touch on self-voicing applications vs apps that work with a screen-reader.

Apple TV Youtube and Amazon apps are examples of breaking Apple UI guidelines, leading to accessibility complications.

Other Concepts and Ideas

Observations on problematic real-life Interactions between Voice assistants and Accessibility Features

  • When Voiceover is enabled, Siri shouts on the watch, even if Voiceover volume is set to quiet.
  • In some iOS releases, Siri and Voiceover on the iPhone often collided while trying to access sound system resources, causing hangs.
  • Voice apps often hear Screenreader output and try and act on it.
  • Siri on iOS devices added typing input in place of speech recognition to accommodate people with speech issues, but it’s not possible to call Siri up via an external keyboard.
  • Saying ‘Hey Siri’ which a HomePod near-by negotiates who will respond, but meanwhile each device has triggered and released, but Voiceover loses where you were up to in a long article, or in a long list of items.

Voice First

Voice First is the Amazon catch-cry but all but low-hanging fruit is shunted to a screen-based app. – Accessibility and usability of the Amazon Alexa app or the Google Home app for installation or configuration is arduous and in no way mirrors the voice simplicity of the device itself.

  • Complex set-up is a barrier to uptake and contrasts to the near effortless set-up of HomePod.
  • When it arrived I spent 1 minute setting up my HomePod – have you tried setting up and tailoring an Echo product or a Google Home? Its slow, tedious and kind of complex/technical.
  • How will older people and people with mild cognitive issues ever do it?

Siri is multi-modal, Multi-Device and mainly offered on a touch-first platform.

  • Siri often assumes you will look at a screen – which doesn’t always work. If I ask my Apple Watch what time it is, it cheerfully responds with a quip about time such as “fruit flies like a banana, time flies like an arrow” – only 13 unnecessary syllables – but then doesn’t actually speak the time.
  • Misguided personality programming doesn’t properly prioritise the most relevant information.

  • Now that Siri is launched on the HomePod, which has no screen, Apple’s multi-modal touch-first model starts to show its biases.

  • The take-away is that even if you are multi-modal, there are many situations where users will be constrained to audio only, so this needs to be factored into designs.

Siri on HomePod trying to decide which device to respond Poor implementation. HomePod wins, even when it can’t execute the task.

Accessibility side-effect is that iPhone loses its place if in a list of items or mid-way through a long document read.

Device should be addressable e.g. HomePod, play some cool music.

Features raised by blind users of voice output systems and services include:

  • Designs that make efficient use of the user’s time
  • Two or more verbosity levels
  • Put the key information near to the front of spoken messages, but not as first syllables
  • Personalisation of settings including speed of voice
  • If the service plays long-form audio such as podcasts or audio books, allow that audio to be played by the user at various speeds
  • Remember last listening point when resuming play in a subsequent session
  • Sync my play position across other platforms such as in my smart phone podcasting app
  • Include instructions and help within the voice app; don’t send the user elsewhere – to an instruction booklet or to an app
  • Provide brief answers and allow for more detail to be requested
  • Allow content to be navigated and sections to be skipped through
  • Allow information, a link, phone number or email address to be repeated or expanded for clarity on request
  • And … For heaven’s sake, provide an un-do command for all those times you mis-hear what I actually said! Were tired of meaningless items turning up on our shopping lists!

Voice Personality, persona, humour, empathy and psychology

  • Human Voices are intrinsically linked with issues of identity and personality. The etymology of the word “Persona” directly translates as “Through Sound”.
  • Humans are wired to listen to more than words spoken, We hear tone, pauses, volume and timbre too. – Those come from us actually understanding meaning and how it can be expressed through voice and spoken word.
  • We hear vocal (tonal) language from our mother’s voice for three months while in her womb, well before we clearly hear verbal words.
  • All of this means that automated speech can trigger conscious (or unconscious) cognitive dissonance when the words sound human but are lacking nuanced meaning, or when they appear to contain contrary vocal messaging to the words being spoken.
  • As an example, how is a computer supposed to meaningfully say “I didn’t say he stole the money”? Which word or words should be emphasised is completely dependent on the surrounding context and the story being told.

Apple is presently leading the way in natural-sounding speech output, at a high audio quality. Over time this investment will strengthen the connection between a user and their assistant.

Google is working fast to create very human-sounding speech too.

Hearing Robotic stilted speech leads to a tendency to generate stilted speech in response – the foreigner in a strange country syndrome.

We are likely to think that If a system speaks funny, then probably it doesn’t ‘get me’. .

Courteous engagements. Do you say Please to your assistants?

Alexa and Google are looking at “the magic word” but in book Flutter it suggests that your speaking style and courteousness tells machines too much about us and our weaknesses.

Some inclusive Voice Experience Design considerations

Note that depending on the platform, these are usually issues beyond the control of a voice application designer, but longer-term these biases and factors need to be considered and addressed.

Is your choice of output voice aligned with your user base? Gender, ethnicity, accent, age, and personality? Do they feel included or separate?

Is your app’s purpose and audience suited to its voice? As a hypothetical, Is a female voice (on all the leading assistants) going to work for a voice-oriented gay male dating app akin to Grindr?

  • Speech recognition biases have been found to be skewed towards anglo adults.
  • Voice models for computer generated text to speech are biased to US and UK white speakers;
  • African Americans have no easy way to use their screen readers with a voice that matches their linguistic community. Apparently, there are no African American speakers in Nuances voice portfolio.

Is your speech recognition engine able to handle speech impediments, stutters, nervous speech, shaky and broken speech? The Mozilla Speech Recognition Corpus Project may be an opportunity to include folks with different speech profiles, speech impediments etc.

Are your timeouts sufficient for people who speak slowly or take more time to formulate their requests/responses?

Does your service have understanding of and respond to colloquial informal terms and phrases from your users? This also has a bearing on how comfortable and accepted your users feel.

Touch Controls or buttons on Smart Speakers

Though not immediately apparent to everyone, eye-hand coordination should not be a design requirement for voice assistants, as they principally work in the auditory (non visual) domain.

Google Home Mini and HomePod both employ touch-sensitive controls for volume adjustment, pause/play etc.

  • In the dark, when you are not awake, when the device is above your line of sight or when you are reaching past the device, touch controls can be wrongly triggered.
  • I am for-ever readjusting the volume of my Google Home Mini devices, or unintentionally resuming music playback on my HomePod when reaching for something else near-by.

Alexa in contrast has physical buttons – with nominal tactually differentiated surfaces, so it can be operated by feel, in the dark. – Though better design overall, physical buttons could be more problematic for people with physical disabilities to operate.

Voice apps and assistants currently perform somewhere around the level of a child or office junior

We are somewhere around v0.1 when it comes to voice assistants. This is why none of them reveal their version number. They are a service, not user software.

  • don’t trust with confidential information;
  • Don’t expect consistently good responses
  • Expect an over-confident manner in contrast to actual capabilities.
  • May have mind elsewhere when you call on it – network or Alexa outages
  • Unexpected or insensitive responses when you are busy and on task – Siri quips and Alexa laughter
  • Frustration and need to rephrase your request several times and sometimes still be unsure if you were accurately understood and if what you requested was actually done
  • Often having to shout and call by name (wake word) to attract attention.

About Tim

Blind from birth, Tim Noonan is a voice experience designer, inclusive design consultant and an expert in voice & spoken communication.

Building on his formal background in cognitive psychology, linguistics and education, Tim has been designing and crafting advanced voice interfaces since the early 90s and was one of the principle authors of the Australian and New Zealand standard on interactive voice response systems AS/NZS 4263.

Tim is the principle author of several other standards relating to automated voice systems, including automated telephone-based voting, telephone-based speech recognition and four industry standards on the accessibility of electronic banking channels and inclusive authentication solutions.

Tim has also been a pioneer in the accessibility field for more than three decades. He particularly loves working with emerging and future technologies to come up with innovative ways to make them more inclusive, effective and comfortable to use.

A career highlight for Tim was working as the lead Inclusive User Experience designer for iVote – a fully automated telephone-based and web-based voting system for the NSW Electoral Commission. iVote was issued with Vision Australia’s Making A Difference Award and was recommended as the ‘Gold Standard’ for accessible voting.

For the last 25 years Tim has been leading the way in teaching, conceptualizing and designing technologies that communicate with users through voice and sound – both for accessibility and mainstream users.

website by twpg