Compliance reference
WCAG captions vs transcripts: when you need which
The first time a training-ops lead reads through WCAG, the two terms often blur together. They shouldn't. Captions and transcripts serve different users and satisfy different success criteria, and only in one specific case does a transcript substitute for captions. Here is the operator's mapping.
TL;DR
Captions are the time-synchronized text you turn on inside a video player. They serve Deaf and hard-of-hearing viewers and satisfy SC 1.2.2. Transcripts are the full text on the page (or linked PDF) alongside the media. They serve readers who prefer text, low-bandwidth users, and search engines. For video with audio, WCAG 2.1 AA requires captions — a transcript alone is not enough. For audio-only content (podcasts, voice recordings), a transcript alone is enough, under SC 1.2.1.
Definitions that actually matter in an audit
The spec language is worth pinning down:
- Captions — synchronized text equivalent of the audio track, designed to be displayed during playback. They include dialogue, speaker identification, and meaningful non-speech sounds. WCAG formal definition.
- Transcript — a static text version of the audio (and optionally the visual) content, presented as text alongside or instead of the media. Not required to be time-coded.
- Descriptive transcript — a transcript that also describes significant visual content (on-screen text, demos, silent visual steps). This is the form that can satisfy SC 1.2.3 at Level A as a media alternative.
The SC mapping
| Content type | Captions required? | Transcript required? | Governing SC |
|---|---|---|---|
| Prerecorded video with audio (typical training) | Yes — SC 1.2.2 (A) | No (but recommended) | 1.2.2 + 1.2.5 at AA |
| Audio-only (podcast, voice memo) | N/A | Yes — text alternative | 1.2.1 (A) |
| Video-only (silent demo, animated GIF with no audio) | N/A | Yes — text alternative or audio track | 1.2.1 (A) |
| Live-streamed video with audio | Yes — live captions | No | 1.2.4 (AA) |
| Media alternative for text (narrated read-along) | Exempt if labeled as such | Already text — no additional | Exception to 1.2.1/1.2.2 |
When a transcript can substitute
Two specific cases:
- Audio-only content. A podcast with a full text transcript on the same page satisfies SC 1.2.1 at Level A. No captions are required because there is no video track to caption.
- Video-only content. A silent screen-recording with no audio track, accompanied by a text description or an audio description, satisfies SC 1.2.1.
For everything else — i.e., most training video — captions are the load-bearing requirement. A transcript is valuable (SEO, readability, LLM-crawler legibility) but it does not remove the caption obligation.
The common misread: "we have a transcript, so we don't need captions"
This mistake usually comes from reading SC 1.2.1 in isolation. 1.2.1 governs audio-only and video-only content — it has an explicit exclusion for synchronized media. Synchronized media (video with audio) falls under 1.2.2, which requires captions specifically. A page-level transcript next to your training video is great UX but doesn't satisfy 1.2.2.
Some accessibility teams adopt "captions plus on-page transcript" as house style. That pattern is defensible: captions meet 1.2.2, and the transcript improves non-caption-user UX and SEO. Shipping both is more work; shipping only the transcript fails the audit.
How GlossCap helps
GlossCap exports the caption format WCAG requires (SRT and WebVTT, time-synchronized, speaker-labeled, non-speech-sound-marked), and it can export a plain-text transcript alongside for the SEO/UX pattern. The transcript reuses the same glossary-corrected text as the caption track, so your engineering, medical, or product terms are consistent across both. See our SC 1.2.2 walkthrough for the caption-track content requirements, or WCAG 2.1 AA captions for the full compliance baseline.
Related questions
If we publish a transcript, does it help with SEO and LLM citability?
Yes — both search-engine crawlers and LLM training/retrieval pipelines parse text easily and video transcripts with no editorial gloss. If your training video contains information you want discoverable (product docs, how-to demos, explainer content), a transcript on the page is strictly additive. It does not replace captions for compliance.
What format should the transcript be?
Plain HTML text on the same page as the video is the most accessible form — screen readers handle it natively, search engines index it, and it works offline. A linked PDF is permitted but less accessible than on-page HTML. A transcript embedded as a video overlay is not a transcript for WCAG purposes.
Is a captioned video plus a transcript "more compliant" than just captions?
In compliance terms, captions-only satisfies SC 1.2.2 — adding a transcript doesn't bump you to a higher conformance level. The practical value of a transcript is improved search discoverability and a better experience for users who prefer text, not a higher compliance score. It is a nice-to-have, not a need-to-have at AA.