On ARIA and Experiential Design

(Or: why it’s not always about ARIA support levels.)

Sometimes I see markup that looks something like this:

<div role="log" aria-relevant="additions">...</div>

To break this down for people not familiar with these attributes:

The log role is intended to represent “A type of live region where new information is added in meaningful order and old information may disappear … Examples include chat logs, messaging history, game log, or an error log[sic].”
The aria-relevant property is intended to indicate “what notifications the user agent will trigger when the accessibility tree within a live region is modified.”

Those are the spec definitions, anyway. In short, the aim of this markup is to have screen readers automatically convey the content of nodes that are added as descendants of the div with role="log".

Screen reader and browser support for this technique is not consistent or reliable. The details of exactly how or why aren’t important to this post, but problems include: duplicate announcements, missing text, and timing issues.

These problems should be “fixed”—although the definition of fixed vs broken isn’t always immediately obvious hence the quote marks. However, they’re not what I’m here to talk about today. Instead, I want to touch on…

Accessibility Attitude

When many developers first encounter these attributes, they’re often working on an aspect of a product’s user interface that they view as tricky to make accessible. As such, an understandably common reaction is: “Great, I can let the browser/screen reader work it out for me!”

This is a problem, specifically because it offloads the difficulty onto technologies that don’t have any understanding of your design intent or the experience you’re trying to build. Just like your visual layout, this situation calls for experiential design that shows users a good time, not magical ARIA attributes that offer no or limited control.

A Practical Example

Consider what will happen if each message added to a chat log uses markup like this (disclaimer: not a recommendation for how to accessibly mark up chat messages):

<div>
	<div class="message-header">
		<span class="sender">Meika</span>
		<span class="timestamp">2:37 PM</span>
	</div>
	<p class="message-text">This is sooo infuriating!</p>
	<div class="message-attachments">
		<img src="..." alt="Screenshot of ...">
		<audio controls aria-label="demo of infuriating thing.mp3" ...>
	</div>
</div>

When this simple textual message with image and audio file attachments gets added to the div with role="log", what and how should a screen reader output?

Should there be a pause in speech between the sender and timestamp? How about the message header and message text, or message text and attachments, or individual attachments, or …?
Should an extra pause be added only when the message text doesn’t end with punctuation that would create one?
Should the full text of the alt attribute on the image be conveyed?
Should the accessible name of the <audio> element be delivered with no decoration? How about the labels of the inner player controls (that differ across browsers)?
How will users know what information belongs to the sender, timestamp, and message text bits, and that the attachments are attachments?
What about punctuation/separation in braille?
Will users always want the timestamp before the text?

These might be nuanced questions for a single developer to answer. But they are absolutely impossible for the browser and screen reader to work out, even if the spec were to be more opinionated and explicit on things like pauses, accessible name inclusion, shadow DOM traversal, etc.

In this specific example, for instance, I would find a pause appropriate between the sender and timestamp (something like “Meika, 2:37 PM”), so maybe the browser could insert that for me. If I later change the format to have the word “at” before the timestamp, the pause will be less appropriate because “Meika at 2:37 PM” flows fine without one.

The Intended User Experience

In these scenarios, I find it helpful to step back and try to answer the previous questions by writing out exactly what should be conveyed by default for this incoming message:

Meika: This is sooo infuriating! 2:37 PM with 2 attachments. image: Screenshot of …; audio: demo of infuriating thing.mp3

Opinions will differ on the exact formatting, verbosity, ordering of information, and more. But as written:

There are explicit pauses between the sender and message text, the attachment count and first attachment, the attachment type and name/description, and each attachment.
There is an implicit pause between the message text and timestamp because the message ends with an exclamation mark. In cases where the message text (after trimming) does not end with such punctuation, an explicit pause could be inserted there too.
The full text of the alt attribute on the image is conveyed, along with an indication of attachment type.
The accessible name of the <audio> element is delivered, likewise with an indication of the attachment type. The labels of the inner player controls are not.
The presence and count of attachments is explicitly called out, but the purposes of the sender, message text, and timestamp are left to the user to infer.
The punctuation in braille, assuming the exact same formatting is used, is kept light and unintrusive and the information is not needlessly spread across multiple lines. Braille-specific formatting may be a future feature.
The timestamp is placed after the text but before the attachments. This decision assumes that the sender and message text are the most important pieces of info for most users, followed by the timestamp which changes often, followed by the attachments which can be wordy and might be relatively rare. The ultimate goal is to allow users to control the ordering and inclusion of message parts.

The Implementation

We have our proposal, but it cannot be practically achieved via the single-message markup that was previously presented.

There are various hacks like off-screen punctuation marks and text, and aria-owns to change the ordering of elements. But these would only serve to make the markup more brittle, complex, and difficult to change.

Instead, the solution is to:

Remove the log role and aria-relevant property;
add a separate, hidden live region elsewhere on the page (e.g. at the very end of the DOM) with appropriate aria-live and aria-atomic values;
manually construct screen reader text for new incoming messages; and
use that live region to convey one message at a time.

I’m not showing any code for that part of the process here; it’s relatively mechanical and doesn’t directly relate to the concept of experiential design I’m trying to centre. The key takeaway is that while role="log" and/or aria-relevant may not be well supported, it turns out they’re a bad fit for this use case so the support level doesn’t matter for it.

Zooming Out

This is not a blog post about accessible chat interfaces. If it were, I would next move on to tackle aspects like the markup of the chat log and each individual message, focus movements, attachment surfacing, notification sounds, and the like.

Nor is it a post about live regions and role="log" specifically.

Rather, the previous exercise is intended to illustrate the gap between oversimplified examples in documentation and the real world. Users in the latter need more and deserve better than the former.

Even when a spec document directly mentions example design patterns, it does not mean that the information is up-to-date, broadly applicable, realistic, or can be applied in isolation. It may simply mean that the spec writers were reaching for illustrative use cases at the time, and had a far less complex approach in mind.

As another example, consider how many articles about live regions mention stock tickers. In the majority of cases, indiscriminately applying aria-live to a single stock ticker (let alone more than one on a page) would be both useless and noisy.

In Closing

ARIA is a tool for information exposure, not for behaviour management. The next time you find yourself lamenting missing or inconsistent support for one or more ARIA attributes, ask: how complex is my use case, and how much of the heavy lifting am I expecting these attributes to take on?

In doing so, you might find that the pendulum needs to swing much more towards the aspects that you do control.

You know your product. Be proud of the experience you’re giving your users with accessibility needs.