Beyond AAA: Why Web Accessibility Standards Still Leave People Behind — NeuroSync Technologies
Loading audio...
Research & Position Paper

Beyond AAA: Why Web Accessibility Standards Still Leave People Behind

WCAG AAA is the highest accessibility standard in the world. It assumes users can either see, read, type, or speak clearly. What about those who can do none of these?

Kirk Harper · NeuroSync Technologies February 2026 8 min read

This paper was not planned. It emerged from a single question asked during the development of a diagnostic assessment tool: can every person access this?

The answer was no. And when we investigated why, we discovered that the highest accessibility standard in the world — WCAG 2.2 Level AAA — would not have caught the problem.

The Standard Everyone Aspires To

The Web Content Accessibility Guidelines, maintained by the World Wide Web Consortium, are the global benchmark for digital accessibility. They operate across three conformance levels: A (minimum), AA (recommended), and AAA (highest). Most legislation, including the Americans with Disabilities Act, the European Accessibility Act, and Section 508, references Level AA as the compliance standard.

Level AAA goes further. It demands 7:1 contrast ratios, sign language interpretation for video, extended audio descriptions, and content that does not require reading ability above lower secondary education level. The W3C itself acknowledges that Level AAA is not achievable for all content.

Most organisations do not attempt it. Those that do are considered exemplary.

We are arguing that it is not enough.

The Assumption at the Heart of Every Standard

WCAG is built on an implicit assumption: the user possesses at least one reliable input modality and at least one reliable output modality. They can see, or they can hear. They can type, or they can speak. They can read, or they can listen. The guidelines ensure that for each barrier, an alternative pathway exists.

This is good design. It has transformed digital accessibility for millions of people. But it contains a structural gap.

What happens when the user cannot see the screen, cannot read text, cannot type, cannot use a mouse, and cannot speak clearly?

WCAG has no answer to this question. Not at Level A. Not at Level AA. Not at Level AAA. The guidelines assume that at least one conventional input and output channel is available. For users where this assumption fails — those with compound disabilities, severe motor impairments, non-verbal communication needs, or combinations of visual, cognitive, and physical limitations — the highest accessibility standard in the world still leaves them outside the door.

16%
of the global population lives with a significant disability
AAA
highest WCAG conformance level — still assumes conventional input
0
websites we found that support non-verbal sound as an input method

How We Found the Gap

NeuroSync Technologies builds tools for organisational transformation. Our flagship diagnostic, the iXform, is a 24-question structural assessment that evaluates whether an organisation is safe to undergo change. It is a commercial product. It generates revenue. It feeds a consulting pipeline.

The company’s founding philosophy is accessibility and the removal of friction. Every product we build asks the same question: what barrier is stopping someone from doing the thing they need to do? Then it removes that barrier.

When we applied this question to our own website, the answer was uncomfortable.

Our site was entirely text-dependent. Every page required the ability to read English. The assessment required the ability to type. Navigation required the ability to see. The contact form required the ability to write. We had built a company on the principle of removing barriers — and our own front door was full of them.

The first fix was obvious

We integrated an audio layer across every page. A neural text-to-speech system using a natural British English voice reads the content of every page by default. Not as an option buried in a settings menu. Not as a widget in the corner. As the default experience. The audio player is the first element a visitor encounters on every page, because the philosophy is not that audio is available — it is that audio is expected.

The second fix was harder

We redesigned our diagnostic assessment to be completable entirely by voice. The system speaks each question aloud, presents the response options verbally, and accepts spoken answers. No reading required. No typing required. No mouse required.

The third fix changed everything

Voice recognition fails for people with speech impairments, heavy accents, non-standard speech patterns, or non-verbal communication. The Web Speech API — the browser-native speech recognition system — requires clear, recognisable words. If you cannot produce those words, voice recognition is not an accessibility solution. It is another barrier wearing different clothes.

So we went further. Our system accepts any repeatable sound as input.

One sound for option one. Two sounds for option two. Three sounds for option three. A grunt, a click, a clap, a tap. The system does not need to understand language. It counts audio events.

When the input is unclear, the system degrades gracefully. It presents each option sequentially: “Is your answer one?” Any sound confirms. Silence declines. The interaction has been reduced to its absolute minimum: a single noise means yes.

This means a person who cannot see, cannot read, cannot write, cannot type, cannot use a mouse, and cannot speak clearly can still complete a 24-question diagnostic assessment using nothing but a repeatable sound.

No existing accessibility standard requires this. No existing accessibility standard even describes it.

What Exists Today

We reviewed the accessibility landscape extensively. The tools, the standards, the research. Here is what we found:

Capability WCAG AAA Current Tools VANTAGE Platform Website Deployment
Screen reader compatibility Required Widely available Implemented Implemented
Keyboard navigation Required Widely available Implemented Implemented
Audio content by default Not required Rare (opt-in only) Core feature Live on every page
Voice-guided form completion Not addressed Does not exist Core feature In progress
Non-verbal sound as input Not addressed Academic research only Core feature In progress
Graceful binary fallback Not addressed AAC devices only Core feature In progress
Simultaneous multi-modal input Not addressed Does not exist 5 modalities (voice, sign, touch, blink, eye tracking) In progress

The academic literature contains relevant work. A 2022 study explored fifteen non-verbal mouth sounds as input for interactive applications, demonstrating that sound event detection can function independently of speech recognition. Research into voice assistants and impaired users has established that conventional voice interfaces fail for people with cognitive, linguistic, or motor impairments due to requirements for clear articulation, specific vocabulary, and precise timing.

Tools like Voiceitt have made progress in recognising non-standard speech patterns, but these are standalone products — not features integrated into commercial websites. The gap between the research and the implementation is vast.

The Architecture

NeuroSync’s flagship product, VANTAGE, is a voice-first AI execution engine that accepts five simultaneous input modalities: voice, sign language recognition (ASL/BSL), touch, blink detection, and eye tracking. It was designed from the ground up as adaptive technology that benefits everyone while solving accessibility challenges that current standards do not address.

What we describe below is the process of applying VANTAGE’s principles to our own website and diagnostic assessment — proving that these capabilities are not theoretical but deployable on any web platform using existing browser APIs.

VANTAGE in Action

In this video, a man who cannot read or write uses VANTAGE to send a card to his daughter. He has no email account. No digital literacy. No way to interact with a conventional screen. Using only his voice, he composes, personalises, and sends a message that would be impossible through any existing digital interface.

This is not a simulation. This is the technology working. The question it raises is simple: if we can do this, why is every form on the internet still text-only?

Audio output

Pre-generated neural text-to-speech audio using a natural British English voice. Every page has an embedded audio player as the first content element. Assessment questions and response options are spoken aloud in sequence. Audio is the default. Text is the fallback.

Input detection

Four simultaneous input channels are always active, with no mode selection required. Voice input via the Web Speech API maps spoken words and numbers to response options. Keyboard input maps key presses (1, 2, 3) directly. Touch and click input accepts taps on visible response options. Sound detection via the Web Audio API monitors the microphone for any audio event above a volume threshold, counting discrete sounds within a timing window.

Graceful degradation

When no clear input is received, the system falls back to sequential binary confirmation. Each option is presented individually. Any sound confirms. Silence declines. The minimum viable interaction is one noise.

No separate “accessible version”

There is no toggle. There is no accessibility menu. There is no alternative version of the site for disabled users. Every visitor encounters the same experience. The audio plays. The inputs listen. The system adapts. This is not an accommodation. It is the architecture.

Why This Matters Beyond Our Website

Approximately 16% of the global population — over 1.3 billion people — lives with a significant disability. Among these, compound disabilities are more common than isolated ones. A person with cerebral palsy may have motor, speech, and visual impairments simultaneously. A stroke survivor may lose fine motor control, clear speech, and reading ability in the same event. An elderly person may experience declining vision, hearing, dexterity, and cognitive function concurrently.

The web accessibility industry has made enormous progress in addressing individual barriers. Screen readers serve the blind. Captions serve the deaf. Keyboard navigation serves those who cannot use a mouse. Voice recognition serves those who cannot type.

But compound barriers remain largely unaddressed. The user who needs all of these accommodations simultaneously — and additionally cannot produce clear speech — finds that each individual solution assumes the availability of another channel that is also impaired.

This is not a niche edge case. It describes millions of people. And the current standards do not serve them.

A Proposal

We are not suggesting that every website needs non-verbal sound detection. We are suggesting that the accessibility standards need to evolve to address compound barriers and non-conventional input methods. Specifically:

Audio-first should be an option, not an afterthought. Content that can be read aloud should be read aloud by default, not hidden behind a button that requires sight to find.

Form completion should not require literacy. Any form that collects information from the public should offer a voice-guided pathway that does not depend on the ability to read or write.

Input methods should degrade gracefully. When speech recognition fails, the system should fall back to simpler interactions — not abandon the user. The binary confirmation pattern (present option, accept sound, advance) is implementable with existing browser APIs at negligible cost.

Multi-modal input should be simultaneous, not sequential. Users should not have to declare their disability in order to receive the appropriate interface. All input channels should be active by default.

None of this requires new technology. The Web Speech API, the Web Audio API, and neural text-to-speech services are mature, widely supported, and affordable. The barrier is not technical. It is philosophical.

The question is not whether we can include everyone. It is whether we have decided to.

What We Are Publishing

We intend to document our full technical implementation and make the architecture available for others to adopt. The voice-guided assessment framework, the multi-modal input system, and the graceful degradation pattern are not proprietary advantages we intend to protect. They are accessibility solutions that should exist everywhere.

If you build websites, build forms, or build digital services that collect information from people, we would welcome the conversation about how to implement these patterns in your context.

The current standards got us here. They are good. They are not finished.

Kirk Harper is the founder of NeuroSync Technologies. The iXform Structural Diagnostic, including its voice-first accessibility layer, is available at neurosync-technologies.ltd. Correspondence: [email protected]

Experience It

The iXform is live. The audio player is on this page. The voice-guided assessment is being deployed. This is not theory — it is practice.

See the iXform

The assessment that should come before everything else

Scroll to Top