Compliance reference

WCAG video captions: every success criterion that applies, mapped

WCAG 2.1 doesn't have one "video captions" rule. It has five timed-media success criteria, and the set that applies depends on whether your video is live, has audio, has significant visual content, or is a media alternative for text. Here is the map.

TL;DR

For a typical prerecorded training video at WCAG 2.1 Level AA, you need SC 1.2.2 Captions (Prerecorded) and SC 1.2.5 Audio Description (Prerecorded). SC 1.2.1 applies only to audio-only or video-only content. SC 1.2.3 is redundant with 1.2.5 at Level AA. SC 1.2.4 is live-captioning-only. Three Level-AAA SCs (1.2.6, 1.2.7, 1.2.8) rarely appear in procurement; skip them unless your legal team explicitly asks.

Full WCAG 2.1 timed-media SC map

SC	Name	Level	Applies to training video?
1.2.1	Audio-only and Video-only (Prerecorded)	A	Only if the asset is audio-only (podcast) or video-only (silent demo). Rare for training.
1.2.2	Captions (Prerecorded)	A	Yes — this is the main one. Details on our SC 1.2.2 page.
1.2.3	Audio Description or Media Alternative (Prerecorded)	A	Yes — but 1.2.5 at AA supersedes it. Ship 1.2.5 and you satisfy 1.2.3.
1.2.4	Captions (Live)	AA	Only for live content — webinars, town halls, live streams.
1.2.5	Audio Description (Prerecorded)	AA	Yes — required at AA when visual content carries information not in the audio.
1.2.6	Sign Language (Prerecorded)	AAA	Skip unless explicitly required.
1.2.7	Extended Audio Description (Prerecorded)	AAA	Skip unless explicitly required.
1.2.8	Media Alternative (Prerecorded)	AAA	Skip unless explicitly required.

What "AA" actually demands on a training-video library

The regulated frameworks — ADA Title II, Section 508, the European Accessibility Act — all reference Level AA conformance. That means your training library needs the cumulative set of Level A + Level AA criteria. For video, that shortens to:

Captions on every video with audio (SC 1.2.2 — Level A, but AA inherits it).
Audio description on every video where the visuals carry meaning not in the dialogue — slide content, on-screen text, demos, silent visual steps (SC 1.2.5 — Level AA).
Live captions on any live-streamed content (SC 1.2.4 — Level AA).

That's it. The rest are AAA and out of scope for the regulatory frameworks currently enforced against training-video programs.

What auditors check on a sampled video

A caption audit is not a full-library re-transcription. It is sampling. From experience with public-university and enterprise audits, sampling checks converge on five things:

Caption track exists and is selectable in the video player.
Spot-check accuracy on 2–3 sampled 60-second segments — technical terms and proper nouns get compared word-by-word. Any mangle on a sampled segment is typically a finding.
Speaker labels present on off-camera dialogue.
Non-speech sounds bracketed where they carry meaning (alarms, laughter, music cues).
Audio description track (or equivalent) for videos with significant visual-only information.

How GlossCap helps

GlossCap is scoped to SC 1.2.2 captions — the one we can ship at ≈99% accuracy out of the box by biasing the speech model on your company glossary. The export formats are standard (SRT, WebVTT, TTML), so the caption track drops into any LMS: TalentLMS, Docebo, Absorb, Kaltura, Panopto, a self-hosted player. Audio description for SC 1.2.5 is a separate production workflow we don't cover — most customers pair us with human describers or handle it in-house. For a full walk-through of what passes 2.1 AA, see our 2.1 AA reference.

See pricing