Compliance reference

WCAG video captions: every success criterion that applies, mapped

WCAG 2.1 doesn't have one "video captions" rule. It has five timed-media success criteria, and the set that applies depends on whether your video is live, has audio, has significant visual content, or is a media alternative for text. Here is the map.

TL;DR

For a typical prerecorded training video at WCAG 2.1 Level AA, you need SC 1.2.2 Captions (Prerecorded) and SC 1.2.5 Audio Description (Prerecorded). SC 1.2.1 applies only to audio-only or video-only content. SC 1.2.3 is redundant with 1.2.5 at Level AA. SC 1.2.4 is live-captioning-only. Three Level-AAA SCs (1.2.6, 1.2.7, 1.2.8) rarely appear in procurement; skip them unless your legal team explicitly asks.

Full WCAG 2.1 timed-media SC map

SCNameLevelApplies to training video?
1.2.1Audio-only and Video-only (Prerecorded)AOnly if the asset is audio-only (podcast) or video-only (silent demo). Rare for training.
1.2.2Captions (Prerecorded)AYes — this is the main one. Details on our SC 1.2.2 page.
1.2.3Audio Description or Media Alternative (Prerecorded)AYes — but 1.2.5 at AA supersedes it. Ship 1.2.5 and you satisfy 1.2.3.
1.2.4Captions (Live)AAOnly for live content — webinars, town halls, live streams.
1.2.5Audio Description (Prerecorded)AAYes — required at AA when visual content carries information not in the audio.
1.2.6Sign Language (Prerecorded)AAASkip unless explicitly required.
1.2.7Extended Audio Description (Prerecorded)AAASkip unless explicitly required.
1.2.8Media Alternative (Prerecorded)AAASkip unless explicitly required.

What "AA" actually demands on a training-video library

The regulated frameworks — ADA Title II, Section 508, the European Accessibility Act — all reference Level AA conformance. That means your training library needs the cumulative set of Level A + Level AA criteria. For video, that shortens to:

That's it. The rest are AAA and out of scope for the regulatory frameworks currently enforced against training-video programs.

What auditors check on a sampled video

A caption audit is not a full-library re-transcription. It is sampling. From experience with public-university and enterprise audits, sampling checks converge on five things:

  1. Caption track exists and is selectable in the video player.
  2. Spot-check accuracy on 2–3 sampled 60-second segments — technical terms and proper nouns get compared word-by-word. Any mangle on a sampled segment is typically a finding.
  3. Speaker labels present on off-camera dialogue.
  4. Non-speech sounds bracketed where they carry meaning (alarms, laughter, music cues).
  5. Audio description track (or equivalent) for videos with significant visual-only information.

How GlossCap helps

GlossCap is scoped to SC 1.2.2 captions — the one we can ship at ≈99% accuracy out of the box by biasing the speech model on your company glossary. The export formats are standard (SRT, WebVTT, TTML), so the caption track drops into any LMS: TalentLMS, Docebo, Absorb, Kaltura, Panopto, a self-hosted player. Audio description for SC 1.2.5 is a separate production workflow we don't cover — most customers pair us with human describers or handle it in-house. For a full walk-through of what passes 2.1 AA, see our 2.1 AA reference.

See pricing

Related questions

Do we need sign-language interpretation on training video?

Only if you target WCAG AAA conformance or if a specific jurisdiction mandates it. ADA Title II and the EAA reference AA, not AAA, so sign-language interpretation is usually not required by the regulations. Individual universities or public agencies may impose stricter local standards.

Can captions count as audio description if the captions also describe what's happening on screen?

No. Captions render the audio; audio description renders visual content for viewers who can't see the screen. They are different access modes for different users and the spec keeps them separate. A caption track that editorializes about visuals is not a substitute for an audio description track.

What's the easiest way to see which SCs we currently fail?

Run an automated accessibility scan (axe, WAVE, Lighthouse), then do a manual spot-check on three videos: does each have a caption track, does the caption track contain your product names spelled correctly, and does it mark speaker changes. Those three checks catch the overwhelming majority of 1.2.2 findings on training libraries.

Further reading