Compliance reference
WCAG video captions: every success criterion that applies, mapped
WCAG 2.1 doesn't have one "video captions" rule. It has five timed-media success criteria, and the set that applies depends on whether your video is live, has audio, has significant visual content, or is a media alternative for text. Here is the map.
TL;DR
For a typical prerecorded training video at WCAG 2.1 Level AA, you need SC 1.2.2 Captions (Prerecorded) and SC 1.2.5 Audio Description (Prerecorded). SC 1.2.1 applies only to audio-only or video-only content. SC 1.2.3 is redundant with 1.2.5 at Level AA. SC 1.2.4 is live-captioning-only. Three Level-AAA SCs (1.2.6, 1.2.7, 1.2.8) rarely appear in procurement; skip them unless your legal team explicitly asks.
Full WCAG 2.1 timed-media SC map
| SC | Name | Level | Applies to training video? |
|---|---|---|---|
| 1.2.1 | Audio-only and Video-only (Prerecorded) | A | Only if the asset is audio-only (podcast) or video-only (silent demo). Rare for training. |
| 1.2.2 | Captions (Prerecorded) | A | Yes — this is the main one. Details on our SC 1.2.2 page. |
| 1.2.3 | Audio Description or Media Alternative (Prerecorded) | A | Yes — but 1.2.5 at AA supersedes it. Ship 1.2.5 and you satisfy 1.2.3. |
| 1.2.4 | Captions (Live) | AA | Only for live content — webinars, town halls, live streams. |
| 1.2.5 | Audio Description (Prerecorded) | AA | Yes — required at AA when visual content carries information not in the audio. |
| 1.2.6 | Sign Language (Prerecorded) | AAA | Skip unless explicitly required. |
| 1.2.7 | Extended Audio Description (Prerecorded) | AAA | Skip unless explicitly required. |
| 1.2.8 | Media Alternative (Prerecorded) | AAA | Skip unless explicitly required. |
What "AA" actually demands on a training-video library
The regulated frameworks — ADA Title II, Section 508, the European Accessibility Act — all reference Level AA conformance. That means your training library needs the cumulative set of Level A + Level AA criteria. For video, that shortens to:
- Captions on every video with audio (SC 1.2.2 — Level A, but AA inherits it).
- Audio description on every video where the visuals carry meaning not in the dialogue — slide content, on-screen text, demos, silent visual steps (SC 1.2.5 — Level AA).
- Live captions on any live-streamed content (SC 1.2.4 — Level AA).
That's it. The rest are AAA and out of scope for the regulatory frameworks currently enforced against training-video programs.
What auditors check on a sampled video
A caption audit is not a full-library re-transcription. It is sampling. From experience with public-university and enterprise audits, sampling checks converge on five things:
- Caption track exists and is selectable in the video player.
- Spot-check accuracy on 2–3 sampled 60-second segments — technical terms and proper nouns get compared word-by-word. Any mangle on a sampled segment is typically a finding.
- Speaker labels present on off-camera dialogue.
- Non-speech sounds bracketed where they carry meaning (alarms, laughter, music cues).
- Audio description track (or equivalent) for videos with significant visual-only information.
How GlossCap helps
GlossCap is scoped to SC 1.2.2 captions — the one we can ship at ≈99% accuracy out of the box by biasing the speech model on your company glossary. The export formats are standard (SRT, WebVTT, TTML), so the caption track drops into any LMS: TalentLMS, Docebo, Absorb, Kaltura, Panopto, a self-hosted player. Audio description for SC 1.2.5 is a separate production workflow we don't cover — most customers pair us with human describers or handle it in-house. For a full walk-through of what passes 2.1 AA, see our 2.1 AA reference.
Related questions
Do we need sign-language interpretation on training video?
Only if you target WCAG AAA conformance or if a specific jurisdiction mandates it. ADA Title II and the EAA reference AA, not AAA, so sign-language interpretation is usually not required by the regulations. Individual universities or public agencies may impose stricter local standards.
Can captions count as audio description if the captions also describe what's happening on screen?
No. Captions render the audio; audio description renders visual content for viewers who can't see the screen. They are different access modes for different users and the spec keeps them separate. A caption track that editorializes about visuals is not a substitute for an audio description track.
What's the easiest way to see which SCs we currently fail?
Run an automated accessibility scan (axe, WAVE, Lighthouse), then do a manual spot-check on three videos: does each have a caption track, does the caption track contain your product names spelled correctly, and does it mark speaker changes. Those three checks catch the overwhelming majority of 1.2.2 findings on training libraries.