Program Management · Published 2026-06-02
Building a caption compliance program from scratch: from policy to audit trail in 90 days
Most L&D teams cross the threshold from "we need captions" to "we have captions on some things" and then stall. The back catalogue stays partially captioned. New video ships without captions until someone complains. The instructional designer who absorbed the caption-correction work quietly leaves, and the institutional knowledge leaves with her. Six months after the compliance conversation started, the team is further behind than when it started because the new production rate is outpacing the retrofit rate and no one owns the gap. This post is not about how to caption a video. It is about how to build a program — a policy, a workflow, a vendor relationship, a quality-control gate, and an audit-documentation package — that makes caption compliance self-sustaining rather than perpetually reactive. The 90-day plan below is structured around five phases: inventory and gap analysis, policy and governance, vendor and tooling selection, production workflow and QC, and back-catalogue retrofit. Each phase has concrete deliverables, a realistic timeline, and the specific failure modes that derail teams at that phase. If you are a VP of Learning, Director of L&D, or training-operations manager who has been handed "get us compliant" as a deliverable, this is the operational playbook for translating that mandate into a working program. If you want the compliance-law landscape that determines which obligations apply to your organisation — the US compliance matrix post covers all five US frameworks with a decision tree — this post assumes you have run that analysis and know what you need to comply with. And if you are at the stage of choosing a captioning vendor, the RFP playbook post walks the full 14-question scoring sheet with six anonymised vendor responses.
TL;DR
Caption compliance is a program problem, not a project problem. A single sprint produces a point-in-time result; a program produces a durable system. The 90-day plan below gives you the minimum viable program architecture:
- Days 1–14 (inventory): Catalogue every video asset, map compliance obligations to content tiers, quantify the gap in coverage and accuracy.
- Days 15–28 (policy): Draft and ratify a captioning policy with a WCAG 2.1 AA technical standard, defined ownership, and an escalation path for non-compliance.
- Days 29–45 (vendor and tooling): Run the vendor selection or confirm your existing tooling against the program requirements; establish the glossary architecture for your training vocabulary.
- Days 46–70 (production workflow and QC): Build the caption-before-publish gate into your video production process; define the QC rubric and rejection criteria; document who reviews and what threshold triggers rejection.
- Days 60–90 (back-catalogue retrofit): Triage the existing catalogue by compliance urgency; execute the Tier-1 retrofit (live compliance obligations); document the Tier-2 plan and timeline for the audit record.
After day 90, the program runs on a monthly QC sweep, a quarterly back-catalogue cadence, and an annual policy review. The audit-documentation package — WCAG 2.1 AA SC 1.2.2 conformance records, training-record logs under applicable law, and the escalation-path evidence — is the output that satisfies a legal or compliance audit, not the captions themselves. Below is the full playbook.
Why this is a program problem, not a project problem
The distinction matters operationally. A project has a start date, an end date, and a deliverable — "caption the 200 videos in the compliance training library by March 31." A program has an operating model — a policy that defines what standard every video must meet, a workflow that ensures new video enters the system already captioned to that standard, a vendor relationship that scales with production volume, a QC gate that catches failures before they reach the LMS, and an audit trail that documents the program's operation over time. The project is one run through the back catalogue. The program is what prevents the back catalogue from recurring.
The difference becomes visible when you run the numbers. A typical 50–500-employee SaaS or mid-market organisation produces between 20 and 150 new hours of training video per year — onboarding updates, product-release recordings, compliance refreshers, manager-track courses, sales-enablement clips. At even the low end of that range, 20 new hours per year means the back-catalogue retrofit is not a one-time problem: it is a recurring gap-accumulation rate. If your retrofit is running at 15 hours per year and production is running at 20 hours per year, you are falling further behind at 5 hours per year even while actively working on it. The half-FTE cost post quantifies what that drift costs in practitioner time before it surfaces in a compliance audit or an accommodation request.
The second reason the program framing matters is the audit-trail problem. A compliance audit — whether triggered by an OCR complaint, a HIPAA documentation review, a Joint Commission triennial survey, or a customer security questionnaire — does not ask "how many videos have captions." It asks questions that a project cannot answer: When was the captioning policy established? Who owns caption QC? What is the documented standard (WCAG 2.1 AA SC 1.2.2)? What evidence exists that new production follows the standard? What is the documented plan for remediating the back catalogue? How are accommodation requests for caption access handled? None of these questions can be answered by pointing at a completed sprint.
The third reason is the accuracy problem. WCAG 2.1 AA requires 99%+ accuracy — a threshold that YouTube auto-captions reliably miss on any content with proper-noun density above about 3% of words. Training video has proper-noun density in the range of 8–25% depending on the vertical: compliance training with acronym-heavy regulatory vocabulary, safety training with OSHA citation codes, technical onboarding with product names and SDK symbols. A project that runs your back catalogue through a generic auto-caption tool and calls it done may have 80–90% word accuracy but is not WCAG-compliant — and is worse, because it creates a false coverage signal in whatever tracking sheet you are maintaining. A program includes a QC gate that catches that failure before it gets on the record.
The three traps that keep programs from forming
Before laying out the 90-day plan, it is worth naming the three organisational traps that typically prevent the transition from project to program:
Trap 1: No single owner. L&D believes captioning is IT's problem because it touches the LMS. IT believes it is L&D's problem because it is content governance. Compliance believes it is whoever is currently failing the audit. In the absence of a named owner, caption compliance lives in the escalation gap between functions and is effectively owned by no one. The captioning policy (Phase 2) is what assigns ownership explicitly — it is a governance document, not just a technical spec.
Trap 2: Measuring coverage, not compliance. The modal metric for caption compliance in L&D organisations is "percentage of videos with captions." This metric is nearly useless as a compliance signal because it conflates a YouTube auto-caption track on a drug-name-dense medical training video (legally non-compliant) with a WCAG 2.1 AA-grade glossary-corrected caption file on an engineering onboarding module (legally compliant). The program needs two metrics: coverage (what share of the active catalogue has any caption file) and compliance (what share has a WCAG 2.1 AA-grade caption file with documented accuracy above 99%). Only the second metric answers the audit question.
Trap 3: No escalation path for new production. The most common failure mode in L&D organisations that have started a captioning program is that the program applies to the back catalogue but not to new production. A subject-matter-expert records a product-release walkthrough, uploads it to the LMS as a new course module, and no one triggers the caption workflow because there is no gate. Six months later, the new content is the uncaptioned back catalogue. The production workflow gate (Phase 4) closes this loop.
Phase 1: Inventory and gap analysis (Days 1–14)
The gap analysis is the foundation the rest of the program stands on. You cannot write a credible captioning policy without knowing what you are governing, you cannot size the vendor relationship without knowing the volume, and you cannot produce a back-catalogue retrofit plan without a triage ranking. Two weeks is enough time to produce a complete inventory if you approach it systematically. Here is the framework.
Step 1: Map your video surfaces
Video for an organisation of the type this post targets — 50–500 employees, technology, healthcare, engineering, or university — typically lives in five to eight distinct locations. Common surfaces:
- The primary LMS — TalentLMS, Docebo, Absorb, Cornerstone OnDemand, Workday Learning, Canvas, or whichever platform your formal training programme runs on.
- Video hosting attached to the LMS — Kaltura, Panopto, or an embedded Vimeo/Wistia account used as the video layer under the LMS course shell.
- Async video tools — Loom, Vidyard, or similar tools used for onboarding and product-release communications that are subsequently linked into LMS modules or shared via HR platforms.
- The authoring tool output library — Articulate, Camtasia, iSpring SCORM packages that contain video tracks, hosted either inside the LMS or on a shared drive pending upload.
- Public-facing video — a customer academy on Skilljar, a YouTube channel with product tutorials, or a Wistia-hosted customer-onboarding library — anything publicly accessible triggers ADA Title III obligations that are distinct from employee-training Title I obligations.
- The intranet / document management layer — SharePoint video pages, Microsoft Stream recordings, Teams meeting recordings archived on SharePoint — often not counted in an LMS inventory but carrying ADA Title I obligations for any mandatory training content.
For each surface, you need three numbers: total asset count, total hour count, and caption-file count. Do not estimate. Pull the actual numbers from the platform APIs or admin dashboards. Estimates always undercount the back catalogue by 20–40% because the long tail of one-off recordings that predates the LMS migration, the Loom clips from engineering leads, and the Teams recordings from last year's annual kickoff are invisible to anyone who has not gone looking for them.
Step 2: Map compliance obligations to content categories
Not all video carries the same legal obligation, and the triage ranking you produce in the next step will be wrong if you have not mapped the obligations first. The full mapping is in the US compliance matrix post, but the operative cuts for an L&D inventory are:
- Mandatory employee training — any video that an employee is required to complete as a condition of employment or role. This content carries ADA Title I obligations (15+ employees) and California FEHA obligations (5+ employees in California). If any assigned employee has a hearing disability, any video in the mandatory training catalogue is subject to an accommodation request under 29 CFR § 1630.9 — and an accommodation request without an existing caption file means an emergency sprint that typically takes 2–4 weeks and costs 3–8× the baseline captioning rate due to the urgency premium.
- Healthcare and clinical training — any training video used by clinical staff, whether produced by your organisation or distributed through a healthcare LMS like HealthStream. These carry HIPAA § 164.530(b) workforce-training documentation requirements, Joint Commission HR.01.05.03 requirements, and in many states Section 1557 patient-care-adjacent obligations.
- Safety training — any video linked to an OSHA-governed safety obligation (29 CFR 1910 or 1926 standards, MSHA 30 CFR Part 48 for mining, etc.) requires documentation that training was effective under the standard. A caption-less safety training video cannot be documented as effective for a hearing-impaired employee without a reasonable-accommodation record.
- Federal-contractor and publicly funded training — any training content developed under a federal contract or grant carries Section 508 obligations. If your organisation receives federal financial assistance (Title IV Pell Grants for a university, Medicaid reimbursement for a healthcare provider, SBIR for a tech startup), Section 504 obligations may apply to your training content.
- Customer-facing and publicly accessible training — a customer academy on a SaaS vendor's website, a certification portal, a product-tutorial library accessible without an account login. These trigger ADA Title III public-accommodation obligations.
Step 3: Build the triage four-tier model
With the compliance mapping done, you can assign every video asset in your inventory to one of four tiers:
| Tier | Definition | Timeline obligation | Action |
|---|---|---|---|
| Tier 1 | Live compliance obligation with defined enforcement mechanism (ADA Title II WCAG 2.1 AA as of April 2026, active accommodation request, federal contractor obligation, Joint Commission upcoming survey) | Immediate — already past or enforcement-proximate | Caption during back-catalogue retrofit Phase 5; document in audit record |
| Tier 2 | Near-term compliance obligation (upcoming EAA enforcement for EU-accessible content, ACA Section 1557 deadline cascade, Section 508 contract renewal approaching) | Within 12 months | Caption before the applicable deadline; include in retrofit plan with documented timeline |
| Tier 3 | Operational risk with no hard deadline (mandatory employee training not yet subject to an accommodation request; customer-academy content in a state without active Title III enforcement) | Best-effort within 24 months | Caption on rolling cadence as capacity permits; document the plan |
| Tier 4 | Archived content with no active audience (recordings from prior years, deprecated product versions, off-market courses with zero completions in the last 12 months) | No hard deadline; archive-only access | Caption if cost is minimal; otherwise document as "archive-only — WCAG compliance remediation deferred pending content lifecycle review" |
The Tier 4 designation matters for the audit record. A compliance audit looking at the gap analysis report will flag every uncaptioned video unless you have documented a disposition for each one. "Archive-only content deferred pending lifecycle review" is a credible audit answer for a 2019 all-hands recording that has had three completions in the last three years. "No captions" with no documented rationale is not.
Phase 1 deliverables
At the end of the two-week inventory, you should have:
- A video asset inventory spreadsheet with one row per asset: platform, title, URL/asset ID, runtime in minutes, current caption status (none / partial / full / WCAG-grade), compliance tier (1–4), and compliance framework triggering the obligation.
- A gap summary: total hours by tier, current WCAG-grade hours by tier, gap hours by tier.
- A first-pass volume estimate for vendor sizing: total Tier-1 hours to caption within 90 days, total Tier-2 hours to caption within 12 months, steady-state new-production hours per year.
This deliverable is not just an internal planning tool. It is the foundation of the audit-response pack. Keep it versioned and dated. The first version of the gap analysis is evidence that the organisation took a systematic approach to understanding its compliance exposure. Subsequent versions are evidence of remediation progress.
Phase 2: Policy and governance (Days 15–28)
The captioning policy is the document that transforms an ad-hoc captioning effort into a program. Without a ratified policy, every decision about whether to caption a given video is made individually and inconsistently. With a policy, the decision rule is clear, the standard is defined, the ownership is assigned, and non-compliance triggers a documented escalation path rather than a shrug. Getting a policy drafted, reviewed, and ratified in two weeks is aggressive but achievable if you scope it correctly — and scoping it correctly means writing a policy that is short enough to actually be read and followed, not a 40-page accessibility manual that no one will consult in the workflow moment.
What a minimum viable captioning policy contains
A minimum viable captioning policy has six components:
1. Scope. Which content does the policy govern? Recommended scope: all training video produced or procured by the L&D function, whether hosted internally or externally, whether employee-facing or customer-facing. Include async video tools (Loom, Vidyard) explicitly — they are the most commonly forgotten surface. Exclude conference recordings or informal team-meeting recordings unless your organisation has a specific obligation to caption them (if you do, include them).
2. Technical standard. WCAG 2.1 Level AA Success Criterion 1.2.2 (Captions (Prerecorded)) is the standard. Write it in the policy by name and number. The policy should specify the minimum accuracy floor (99%+), the maximum synchrony deviation (±2 seconds from the spoken word), and the requirement for captions to be provided as a sidecar file in at least SRT or VTT format rather than burned-in open captions (burned-in captions cannot be restyled for users with visual impairment who need high-contrast or enlarged text). The policy should also specify that captions must be accurate for the organisation's specific vocabulary, including product names, acronyms, regulatory terms, and proper nouns — not just phonetically plausible substitutes.
3. Ownership. One named role is the Caption Program Owner. That role is responsible for vendor management, QC oversight, escalation triage, and the audit-trail maintenance. Typically this is a senior instructional designer or a training-operations manager — not a general IT function. If there is no existing role to assign, the policy should name the function (L&D) and commit to a role assignment by a specific date. Distributed ownership ("the producing team is responsible") works as a contributing model but not as the accountability model — there must be one function that can be asked "is X video captioned to standard?" and can answer definitively.
4. Production workflow requirement. No new training video shall be published to the LMS or any training surface without a WCAG 2.1 AA-compliant caption file. This is the gate. The policy should specify who is responsible for attaching the caption file (the instructional designer on the producing course, the training-operations coordinator, or the vendor), what platform the caption file is submitted through, and what the expected turnaround time is from video finalisation to caption delivery. For most organisations with a good vendor relationship, a 24–48-hour turnaround from final video to approved caption file is achievable. Build that into the production schedule, not as an afterthought.
5. Back-catalogue commitment. The policy should include a commitment to remediate the back catalogue on the triage timeline established in Phase 1. Write it as: "As of [effective date], [Organisation] has identified [X hours] of Tier-1 training video without WCAG 2.1 AA-compliant captions. [Organisation] commits to captioning all Tier-1 video by [date], all Tier-2 video by [date], and all Tier-3 video on a rolling cadence to be completed by [date]." This is the statement that answers the audit question "what is your remediation plan?" — and it converts the Phase 1 gap analysis into a binding commitment that the audit trail will then track to completion.
6. Escalation path. What happens when a video is published without a caption file? The policy should specify: (a) the producing team is notified within one business day; (b) the video is removed from the published catalogue or access-restricted within three business days pending captioning; (c) the Caption Program Owner logs the incident in the compliance incident register; (d) if the video cannot be removed from the catalogue (e.g., it is embedded in a required compliance training module due in the current week), an interim accommodation measure is documented (provide the transcript to any enrolled hearing-impaired employees while the caption file is produced). The escalation path matters for two reasons: it creates the deterrent that makes the production gate sticky, and it documents the organisation's response to incidents — which is what the audit trail needs to show good-faith compliance management.
Who needs to sign the policy
A captioning policy with signatures from only the L&D function carries limited weight in an external audit, because an auditor's question is typically "did the organisation as an institution commit to this standard, or did one team adopt a voluntary practice?" The minimum signature level that gives the policy institutional weight is the VP or Director of HR or People Operations (who owns the ADA Title I accommodation obligation), the VP of L&D or Chief Learning Officer, and a legal sign-off from General Counsel or outside counsel (a one-paragraph email saying "this policy meets the documented standard" suffices — you do not need a lengthy legal memo). If your organisation has a Chief Accessibility Officer, their signature is the most direct legal line. If you are at a university subject to ADA Title II, the ADA/504 Coordinator's signature is important. If you are a public university subject to the new April 2026 ADA Title II WCAG 2.1 AA mandate, the institution's ADA Coordinator is likely already involved in the compliance conversation — pull them into the policy ratification.
Phase 2 deliverables
- Ratified captioning policy document, version 1.0, with effective date and named signatories.
- Named Caption Program Owner with a written role-and-responsibility document (even if it is a paragraph attached to an existing job description).
- A compliance incident register template — a simple spreadsheet with fields for incident date, video title/URL, producing team, date remediated, and outcome. You will use this immediately and it will become a critical audit-trail document.
Phase 3: Vendor and tooling selection (Days 29–45)
With the gap quantified and the policy ratified, you know what volume the vendor relationship needs to support, what accuracy standard it needs to meet, and what LMS integration pattern it needs to fit. If you are starting without a vendor, this is where you run the RFP or at minimum a structured vendor evaluation. The RFP playbook post covers the full 14-question evaluation — this section focuses on the program-specific requirements that the RFP playbook does not foreground: the glossary architecture question and the LMS integration depth question.
The glossary question
The single most important technical capability question in a captioning vendor evaluation for training video is not stated on any standard procurement checklist: does the vendor support per-customer glossaries that influence the model's decoding, or only post-hoc find-and-replace? These are fundamentally different architectures with fundamentally different accuracy outcomes.
Post-hoc find-and-replace means the vendor auto-captions using a generic model and then runs a text substitution pass — replacing every instance of a phonetically plausible substitution with the correct term. This works for high-frequency proper nouns that have stable phonetic patterns (replacing "Kantura" with "Kaltura," for example). It fails for proper nouns with phonetically similar incorrect forms (replacing a drug name that the model transcribed as a common English word), for multi-word proper nouns where the word break lands in different places than the substitution rule expects, and for contextually ambiguous terms where the same phonetic string is the correct form in one context and an error in another. A sales-enablement video that mentions both "revenue recognition" (an accounting term) and "Revenue Cloud" (a Salesforce product name) will confuse a find-and-replace glossary that does not have context to distinguish them.
Glossary-biased decoding means the correct term is injected into the model's hypothesis space during the transcription step — the model is guided toward the correct term at the moment of transcription, not corrected after the fact. This produces materially higher accuracy on the dense proper-noun surfaces that are characteristic of training content. The Whisper glossary engineering post walks the technical mechanism. From a program standpoint, the practical implication is: if your training content includes drug names, SDK function names, product-tier names, or regulatory acronyms, evaluate vendors specifically on their glossary architecture, not just their reported WER on generic benchmarks.
Building your company glossary for the vendor is a one-time 2–4 hour effort that pays compound dividends over the life of the program. The glossary is a list of proper nouns, acronyms, and domain-specific terms with their correct spellings. Sources: your product documentation, your compliance training content itself (scan the existing slides and transcripts for recurring proper nouns), your onboarding materials, your legal entity names, and the names of LMS platforms and tools your employees are trained on. The program should establish a glossary update process — typically quarterly, triggered by a new product release or a new compliance training topic.
LMS integration depth
The second program-critical capability is LMS integration depth. Captioning workflow that requires a human to download a file from the vendor portal and manually upload it to each LMS course has a real operational cost per asset — typically 5–10 minutes per video including the download, format check, and upload steps. At 200 videos per year, that is 17–33 hours of administrative labour annually for a task with zero learning or compliance value. A vendor with native API integration to your LMS eliminates that cost. The LMS ingestion engineering post covers the API patterns for the major platforms in detail. For program planning purposes, the questions to ask the vendor are:
- Do you have native integration with [your specific LMS]?
- Does the integration support bidirectional sync (the vendor receives the video URL from the LMS, transcribes, and pushes the caption file back to the LMS as an attached caption track)?
- Does the integration include a QC review step, or does it auto-approve the caption file without human review?
- How are bulk back-catalogue uploads handled — individual asset-by-asset or batch submission?
The QC review step question matters specifically for the compliance program. An integration that auto-approves caption files without human review reduces labour but also removes the QC gate that is the difference between WCAG 2.1 AA compliance and documented 99%+ accuracy compliance. The program design needs to preserve a human QC step somewhere in the workflow, even if the upload to the LMS is automated. See Phase 4 below.
The make-vs-buy decision for small organisations
For organisations with fewer than 50 employees, or for organisations with very low training-video production volumes (fewer than 20 hours per year of new production), the full vendor-RFP model may be overhead. The minimum viable configuration for a small organisation is: a captioning service with a web upload interface, a per-minute pricing model, an accuracy guarantee with a documented revision process, and at minimum the ability to provide SRT/VTT output files for manual LMS upload. The glossary architecture question still applies — a service that supports per-customer vocabulary injection will produce materially better output on your training content than one that does not, regardless of organisation size.
At the other end of the scale, organisations with healthcare training on platforms like HealthStream or Relias have a specific vendor-selection constraint: the captioning vendor needs to meet the same security and privacy standards as the LMS, including the ability to execute a Business Associate Agreement under HIPAA if any patient-identifying information could appear in the training content. The HIPAA training captions post covers the BAA trigger question in detail — the short version is that training content typically does not include PHI, so a BAA may not be triggered, but the determination needs to be documented and cannot be assumed.
Phase 3 deliverables
- Vendor selection decision with documented evaluation criteria and rationale (the RFP scoring sheet or a simplified equivalent).
- Vendor agreement or subscription active, including any required BAA, DPA, or security review documentation.
- Company glossary — version 1.0, loaded into the vendor platform, with a documented update cadence.
- LMS integration configured and tested on a sample set of videos (recommend 5–10 videos representing the primary content categories in the inventory).
Phase 4: Production workflow and quality control (Days 46–70)
The production workflow phase is where the compliance program becomes operational rather than preparatory. The deliverable is not a document — it is a changed behaviour in how video is produced and published. Two workflows need to be established: the new-production workflow (for all video created from this point forward) and the QC workflow (for reviewing caption output before it enters the LMS).
The new-production workflow
The caption-before-publish gate needs to be embedded in the existing video production process at the point where a video is approved for LMS upload. In most L&D teams, that means the instructional designer or course coordinator who owns the LMS publishing step is the person who triggers the captioning workflow. The practical steps:
- Finalise video. The producing instructional designer or SME records and edits the video to final form. Caption files cannot be reliably produced from draft versions where the audio changes during the edit process — submit for captioning only after the video is fully edited and approved.
- Submit to captioning vendor. Upload the final video to the vendor platform, or if the LMS integration is configured for bidirectional sync, mark the course module as "pending captions" in the LMS admin interface which triggers the vendor pull. Include the applicable content category tag (compliance, safety, clinical, engineering, sales-enablement) so the vendor applies the correct glossary profile.
- Receive and QC the caption file. The vendor returns the SRT or VTT file. The QC reviewer (see below) opens the file, spot-checks against the video, and either approves or flags for revision. Target turnaround: vendor delivers within 24–48 hours; QC review completed within 24 hours of receipt; total gate: 48–72 hours from video finalisation to captioned LMS publication.
- Publish to LMS. The QC-approved caption file is attached to the LMS course module. The LMS admin marks the video as "captions verified" in the asset tracking sheet.
- Log in the compliance record. The caption program owner's asset tracking sheet is updated with the publication date, the WCAG compliance grade (pass/fail on the QC review), and the content category. This is the operational log that feeds the audit-trail report.
The QC workflow
The QC step is the single most misunderstood component of a caption compliance program. Organisations that skip it believe they are saving time. They are deferring a larger cost to the accommodation request or audit event. The QC workflow for training video has a specific structure because training content has specific accuracy requirements that generic QC methods miss:
Spot-check protocol, not full review. Full word-by-word review of a caption file for a 30-minute course video is 30 minutes of QC time per video — unsustainable at any meaningful production volume. A spot-check protocol reviews three segments of the video: the first 90 seconds (where the captioner's model context is being established and errors are most common), 90 seconds drawn from the middle of the video at the highest proper-noun-density point (identified by the content category — for engineering onboarding, check the product-demo segment; for compliance training, check the regulatory-citation-dense segment), and the last 60 seconds. If the spot-check finds more than 2 errors in any segment, the file is flagged for full review or returned to the vendor for revision. If the spot-check finds 0 errors, the file is approved.
Proper-noun verification list. For each content category, maintain a list of the 20–30 highest-stakes proper nouns — the product names, drug names, regulatory acronyms, and technical terms that, if miscaptioned, produce the most compliance risk or learner confusion. Check every proper noun on the list against the caption file. This is a 5-minute search-and-scan operation on the SRT/VTT file that catches the category of errors that produces the most downstream compliance risk and the most learner complaints. This is not a substitute for the spot-check — it is a supplement that catches the systematic failures the spot-check might miss if the errors are concentrated in the proper-noun layer rather than distributed through the audio.
Synchrony check. Open the video and the caption file simultaneously and scan for visual de-sync — captions that are more than 2 seconds ahead of or behind the audio. Auto-caption tools occasionally produce synchrony errors on long pauses, fast speech, or segments with background noise. The synchrony check takes 2–3 minutes per video on 1.5× playback speed while watching for caption timing.
Rejection criteria. The QC reviewer rejects a caption file and returns it to the vendor for revision if any of the following are found: (a) an error on any proper noun in the content-category verification list; (b) more than 3 errors in the spot-check protocol; (c) any synchrony deviation greater than 2 seconds lasting longer than 5 seconds; (d) any garbled segment (where the caption text is phonetically unrelated to the audio). These are not arbitrary thresholds — they map to the WCAG 2.1 AA 99% accuracy standard at the statistical level for typical training video lengths of 5–30 minutes.
Who does QC?
The most common QC model is instructional-designer-as-QC-reviewer — the person producing the course does a final caption review as part of the course-approval checklist. This model works for organisations with experienced instructional designers who know the content vocabulary. Its failure mode is that an instructional designer who is not a subject-matter expert in the content area may not recognise when "Movidius" has been transcribed as "movie dish" — they will see plausible text and approve it. The proper-noun verification list is the mitigation for this failure mode: it externalises the knowledge of what the correct terms are so the reviewer does not need to carry it in their head.
For highly specialised content — clinical training with drug names, regulatory training with precise citation formats, engineering onboarding with SDK function names — consider involving a content SME in the QC review for a sample set of assets from each new topic area. The SME review is not needed for every video; it is needed once per topic cluster to validate that the vendor's glossary coverage is complete for that cluster, and then the proper-noun verification list update captures the result for ongoing reviews.
Phase 4 deliverables
- Written production workflow, embedded in the course development checklist that instructional designers follow.
- QC protocol document — spot-check procedure, proper-noun verification list template (with the first content-category lists populated), rejection criteria.
- Asset tracking spreadsheet — operational log with fields for: video title, LMS URL, publication date, caption vendor, caption file format, QC reviewer, QC result (pass/fail/revision-required), revision date (if applicable), WCAG compliance grade (pass once QC approved).
- First month of new-production videos captioned and logged under the new workflow. This is the baseline evidence for the audit trail — showing that the program is operational, not just documented.
Phase 5: Back-catalogue retrofit (Days 60–90)
The back-catalogue retrofit is the heaviest single-phase effort in the 90-day plan, which is why it overlaps with Phase 4 in timing — you want the new-production workflow running before the retrofit begins, so the back catalogue is not growing while you are working through it. The retrofit follows the triage model from Phase 1: Tier 1 first, documented Tier-2 plan for the audit record, Tier 3 on rolling cadence.
Retrofit volume and time estimates
The inputs to the retrofit plan are the gap-analysis numbers from Phase 1: Tier-1 hours without WCAG-grade captions, Tier-2 hours, and the vendor's delivery rate. A typical captioning vendor operating at full capacity can produce and QC-approve caption output for 50–150 hours of training video per month, depending on content complexity and the organisation's QC review capacity. If your Tier-1 gap is 200 hours and your effective monthly throughput (vendor delivery plus your QC review capacity) is 50 hours per month, the Tier-1 retrofit will take four months — not ninety days. This does not invalidate the 90-day plan; it means the 90-day plan produces a documented Tier-1 retrofit that is on track, not necessarily complete. What the audit record needs is evidence of a systematic effort with a documented timeline and demonstrated progress — not a certificate of completion.
The more common situation is a Tier-1 gap of 20–80 hours for mid-market organisations that have been doing some captioning already. 20–80 hours at a 50-hour monthly throughput is a one-to-two-month retrofit — well within the 90-day window. If your Tier-1 gap is in this range, you can commit to full Tier-1 remediation by day 90, which is the strongest possible audit-trail statement.
The retrofit prioritisation sequence within Tier 1
Within Tier 1, the prioritisation sequence should be:
- Any content subject to an active accommodation request or complaint. If there is a pending ADA accommodation request for caption access from a specific employee or student, that content goes to the front of the queue regardless of other factors. The accommodation request creates a specific legal obligation with a response timeline (the standard is as soon as practicable, typically interpreted as days to weeks, not months).
- Mandatory training assigned to active employees in the current quarter. Any content that employees are required to complete before the end of the current quarter, where any enrolled employee could have a hearing disability, is the second-highest priority. This is the content most likely to trigger an accommodation request if not captioned, because the mandatory completion requirement creates a concrete access barrier.
- Content with the highest compliance obligation density. Healthcare training subject to Joint Commission survey, safety training subject to OSHA audit, Section 508-governed federal contract deliverables, or content cited in a pending compliance review. The intensity of the enforcement mechanism, not just the presence of an obligation, should weight the priority order.
- The highest-audience content within the above categories. Where two videos carry the same compliance obligation and neither has triggered an accommodation request, caption the one with more enrolled learners first. The audience size is a proxy for the probability of an access barrier occurring before the content is remediated.
Quality standards for back-catalogue retrofit
The back-catalogue retrofit carries the same WCAG 2.1 AA accuracy standard as new production. The temptation to run the back catalogue through a cheaper or faster captioning workflow — generic auto-caption without glossary correction, for instance — produces a coverage increase that does not improve the compliance position. A caption file at 87% word accuracy is not WCAG 2.1 AA-compliant regardless of how many videos have it. The audit trail that documents 200 videos captioned to 87% accuracy is worse than the audit trail that documents 50 videos captioned to WCAG 2.1 AA standard with a documented plan for the remaining 150 — because the first record shows a systematic acceptance of non-compliant captions, while the second record shows a systematic compliance effort in progress.
Apply the same QC protocol from Phase 4 to back-catalogue assets. Use the proper-noun verification list appropriate to each content category. Log every asset in the asset tracking spreadsheet with its QC result. If a back-catalogue asset fails QC and the vendor cannot correct it within the timeline, document the failure and the expected correction date rather than marking it as captioned.
The retrofit documentation
The retrofit documentation feeds directly into the audit-response pack. The asset tracking spreadsheet, dated and version-controlled, is the primary evidence. For Tier-1 assets specifically, create a separate remediation log that shows each asset's original status (no captions / auto-captions only / partial WCAG-grade), the date captioning was initiated, the date the WCAG-grade caption file was approved, and the WCAG compliance grade. This two-column before/after record is the clearest possible audit-response document for "what did you do to remediate your compliance gap?"
Phase 5 deliverables
- Tier-1 retrofit completed or documented on-schedule timeline with demonstrated progress (week-by-week hours captioned vs. planned in the asset tracking log).
- Tier-2 retrofit plan documented with a specific timeline — even if Tier-2 remediation has not started, the document showing the committed timeline and the first quarter's planned volume is the audit deliverable.
- Asset tracking spreadsheet updated to current state — total WCAG-grade caption hours by tier, remaining gap, projected completion date.
Building the audit trail
The audit trail is not a single document — it is the collection of records produced by operating the program described above. What changes by building the program first is that the records exist and are organised when the audit event occurs, rather than being reconstructed from scattered email threads and Jira tickets under time pressure. The audit-trail documentation package for a mature caption compliance program consists of:
The compliance-evidence package by framework
ADA Title I (private employer) and ADA Title II (public entity): The primary evidence is the captioning policy document with its effective date and ratification signatures, the asset tracking spreadsheet showing WCAG 2.1 AA-grade caption status across the training catalogue, and the QC protocol document showing that the organisation applies a systematic accuracy standard consistent with WCAG 2.1 AA SC 1.2.2. For any specific accommodation request, the evidence is the request date, the response date, the interim measure provided while the caption file was being produced, and the date of final caption delivery. OCR complaint investigations typically focus on systemic barrier claims (no captioning programme) rather than individual incidents; the programmatic evidence package is what resolves a systemic-barrier investigation.
Section 508 (federal contractors and agencies): A Section 508 audit or Voluntary Product Accessibility Template (VPAT/ACR) review requires that training video produced or procured under the contract meet WCAG 2.0 AA SC 1.2.2 (the 508-referenced standard). The evidence is the captioning policy (establish that there is a programme), the asset tracking log for contract-covered training assets (establish that the specific assets comply), and the QC protocol (establish that accuracy is documented, not assumed). If the contract specifies a VPAT, the training-video caption section of the VPAT should reference the captioning policy, the accuracy standard, and the asset tracking log.
HIPAA § 164.530(b) (healthcare organisations): The HIPAA workforce training documentation requirement is primarily about training completion records — the organisation must document that workforce members received and completed required training. Captions enter the HIPAA audit context as an accommodation-documentation question: if a hearing-impaired workforce member is assigned a training video with no caption file, the training-completion record may not be sufficient documentation of effective training delivery. The evidence package is the captioning policy (showing that the organisation applies a standard to training content), the asset tracking log showing caption status for the specific training catalogue covered by the HIPAA documentation requirement, and any accommodation request records showing how access barriers were addressed.
Joint Commission HR.01.05.03 (healthcare): The Joint Commission human resources standard requires that staff are competent to perform their duties and that the organisation maintains evidence of training. For content delivered through platforms like HealthStream, the Joint Commission survey typically looks for completion records and curriculum documentation rather than captioning specifically. But a hearing-impaired employee's completion record for a course without captions is not evidence of effective training — it is evidence that the employee completed a course they could not fully access. The compliance argument is the same as HIPAA: the caption file is part of the training record's completeness, not an optional feature.
Section 1557 (ACA healthcare nondiscrimination): The 2024 HHS final rule under ACA Section 1557 applies WCAG 2.1 AA to covered entities (healthcare providers that receive federal financial assistance). For training video, the relevant provision is the effective communication requirement — training video must be accessible to staff with hearing disabilities. The evidence package is the captioning policy and the asset tracking log, with particular attention to clinical and patient-care training content which is the highest-scrutiny category under a Section 1557 review.
The audit response pack structure
When a complaint or audit inquiry arrives, the response pack should be producible within 48 hours. Structure it as five exhibits:
- Exhibit A: Captioning policy — current version with effective date and signatories.
- Exhibit B: Gap analysis — the Phase 1 inventory, version-dated, showing the organisation's initial assessment of its compliance exposure.
- Exhibit C: Asset tracking log — current state, showing WCAG 2.1 AA compliance status across the training catalogue, with the before/after remediation record for the back-catalogue retrofit.
- Exhibit D: QC protocol — the accuracy verification procedure, including the spot-check protocol, the proper-noun verification list, and the rejection criteria.
- Exhibit E: Accommodation request log (if any) — any specific accommodation requests received, the organisation's response timeline, interim measures provided, and resolution date.
If the inquiry is from a specific complainant (an employee or student who filed a complaint), the response pack should additionally include the complainant's training completion record, the caption status of each assigned video at the time of the complaint, and the dates of any captioning actions taken since the complaint was received. The five-exhibit structure above is what resolves systemic claims; the complainant-specific supplement is what resolves individual-incident claims.
The ongoing cadence: what happens after day 90
Day 90 is not completion — it is the point where the program transitions from build mode to maintenance mode. The maintenance cadence is lighter than the build cadence, but it must be scheduled and owned rather than left to arise organically.
Monthly cadence
New-production review. The Caption Program Owner reviews the asset tracking log for the prior month. All new-production videos published during the month should appear in the log with a WCAG 2.1 AA compliance grade. Any video published without a caption file entry is flagged and the producing team is notified via the escalation path defined in the policy. This review takes 15–30 minutes per month and is the primary mechanism for detecting production-gate failures before they accumulate into a new back-catalogue.
Vendor QC trend review. Review the QC results for the prior month's vendor deliveries. Are the revision request rates trending up or down? Are there recurring error patterns in a specific content category? A rising revision rate on a specific content category typically means the company glossary needs to be updated — a new product was released, a new regulation was cited in training, or a new vendor name entered the training vocabulary. Flag these patterns to the vendor and update the glossary.
Accommodation request check. Confirm with HR that no caption-related accommodation requests were received in the prior month. If any were received, confirm they were logged in the compliance incident register and that the response timeline was met.
Quarterly cadence
Back-catalogue retrofit progress review. Update the asset tracking log with the quarter's retrofit completions. Compare actual vs. planned progress against the Tier-1 and Tier-2 retrofit timeline. If the programme is falling behind the committed timeline, document the gap and the revised plan — a revised timeline with a credible explanation is a stronger audit record than a missed deadline with no documented response.
Glossary update. Review the product release notes, the compliance training update schedule, and any new regulatory guidance published in the quarter for terms that need to be added to the captioning glossary. This is a 30-minute exercise with the relevant product manager or compliance officer.
New-surface check. Has the organisation adopted any new video tools, LMS platforms, or content libraries since the last review? New surfaces that were not in the Phase 1 inventory need to be added to the scope of the captioning programme. The LMS roster in particular changes frequently in mid-market organisations — a pilot of a new platform, an acquisition bringing in a different LMS, a shift from Vimeo to Microsoft Stream for video hosting — and each new surface is a potential new back-catalogue if the captioning programme does not extend to it.
Annual cadence
Policy review. Review the captioning policy against any changes in applicable law since the last policy date. The US compliance matrix post is updated as the legal landscape evolves — check it at the annual policy review. If the organisation's size, funding sources, or customer-base has changed materially, the compliance obligation mapping from Phase 1 may need to be updated.
Full inventory re-run. Repeat the Phase 1 inventory against the current video catalogue. This confirms that the new-production gate has not been leaking (no new uncaptioned content accumulating), catches any content that was added outside normal channels (a legacy system migration, a content library purchase), and produces a refreshed gap number for the annual compliance report.
Annual compliance report. Prepare a one-page compliance status summary: total training video hours in the active catalogue, WCAG 2.1 AA caption coverage percentage, Tier-1 remediation status (completed or on-timeline), new-production gate metrics (percentage of new videos published with a WCAG-grade caption file in the last 12 months), and any material accommodation requests received and resolved. This document is the executive summary of the audit-trail package and is the first thing a compliance officer, legal counsel, or external auditor asks for.
Common failure modes in years one and two
The 90-day plan above produces a functional caption compliance program. The failure modes below are what kill programs in years one and two, after the initial build energy dissipates.
Failure mode 1: Ownership gaps at role transitions
The Caption Program Owner role is the single most important structural element of the programme. When that person leaves the organisation, the programme loses institutional knowledge, the glossary stops being updated, the monthly review stops happening, and within six months the new-production gate has stopped functioning. The mitigation is documentation — the captioning policy names the role, not the person; the QC protocol is a written document any replacement can follow; the asset tracking log is maintained in a shared system accessible to the entire L&D team, not on the departing person's laptop. Succession planning for the Caption Program Owner role should be explicit, not assumed.
Failure mode 2: Measuring coverage instead of compliance
The programme's executive reporting metric drifts back to "percentage of videos with captions" rather than "percentage of videos with WCAG 2.1 AA-grade captions." Coverage and compliance are different numbers. An organisation that reports 95% caption coverage but has 30% of those caption files produced by YouTube auto-captions without QC review is reporting a meaningless compliance number. The asset tracking log must carry both fields — caption-file-present (coverage) and QC-approved-WCAG-grade (compliance) — and the compliance number is the one that goes into the annual compliance report.
Failure mode 3: The "good enough" vendor drift
A vendor relationship that was delivering 98.5% accuracy in year one may drift to 92% accuracy in year two as the vendor's workload increases, the assigned QC reviewer changes, or the organisation's content vocabulary evolves while the glossary stays static. The monthly vendor QC trend review exists specifically to catch this drift. A rising revision-request rate is the leading indicator. An accommodation request citing caption accuracy is a lagging indicator — by the time it arrives, the drift has been ongoing for months. If accuracy drift is confirmed, the response is a vendor discussion, a glossary update, and potentially a re-evaluation of the vendor if the drift cannot be corrected. Staying with a drifted vendor because the switch cost is real is an accumulation of compliance risk that will cost more than the switch when it surfaces.
Failure mode 4: The new-surface blind spot
An organisation buys a new LMS, pilots a customer education platform, or integrates a new async-video tool — and the captioning programme does not extend to it because the tool was not in scope when the programme was built. The quarterly new-surface check catches this, but only if someone runs it. The structural mitigation is to include the Caption Program Owner in the evaluation process for any new tool that involves video hosting or training content delivery — the accessibility requirement, including captioning, is a tool-selection criterion that needs to be evaluated before the tool is purchased, not retrofitted after. Most major LMS and video-hosting platforms now publish their captioning capabilities in their accessibility conformance statements; asking for that documentation during procurement is a reasonable standard and signals to vendors that accessibility is a purchase criterion.
Failure mode 5: The back-catalogue drift back
The back-catalogue retrofit is completed in year one. In year two, new content is being produced and captioned correctly, but a parallel stream of legacy content is being updated and re-uploaded — an updated compliance training module, a refreshed onboarding series, a revised product walkthrough — and the updated versions are not being run through the captioning workflow because the update process does not trigger the same production gate as the creation process. The mitigation is to make the captioning gate apply to updates as well as new production: any video that is replaced with a new version in the LMS requires a new caption file for the new version. This is an additional workflow step, but it is necessary because the caption file for the old version is not valid for the new version — the audio has changed.
What a well-designed captioning tool addresses in this program
The 90-day program above is platform-agnostic — it works with any captioning vendor that meets the accuracy standard and the LMS integration requirements. That said, three of the five phases have technical requirements that a well-designed captioning tool specifically addresses:
Phase 3 (vendor selection) — the glossary architecture. The choice between glossary-biased decoding and post-hoc find-and-replace is the single largest driver of caption quality variance on training content. A tool designed for training content builds the company glossary into the transcription step, not as a post-processing layer. GlossCap's glossary-biased Whisper decoding applies the customer's product names, acronym register, and domain vocabulary during transcription — producing materially higher accuracy on the proper-noun surfaces that dominate training content and drive the most compliance risk when miscaptioned.
Phase 4 (QC) — accuracy documentation. The QC protocol requires documented accuracy evidence per asset, not just a subjective reviewer sign-off. A tool that produces a word-error-rate estimate or a confidence score per asset gives the Caption Program Owner a machine-generated accuracy signal that supplements the human spot-check, making the QC workflow faster and the documentation more defensible. The compliance record should show a documented accuracy grade, not just a binary caption-file-present flag.
Phase 4–5 (LMS integration) — the upload automation. The retrofit and ongoing production workflows both require reliable, bulk-capable LMS integration. GlossCap's native integrations to TalentLMS, Docebo, Kaltura, Panopto, Canvas, and other major platforms eliminate the per-asset upload labour that makes back-catalogue retrofits prohibitively time-consuming at volume. The integration includes the QC review step — the caption file is held in a review queue before the LMS push, preserving the human gate while automating the logistics.
If your training content includes the 15 proper-noun failure categories that break generic auto-captions — product names, drug names, SDK symbols, regulatory acronyms, person names, organisation names — the glossary architecture is not a nice-to-have. It is what determines whether the QC review is a light spot-check confirming an already-accurate file, or a full correction exercise on a file that needed manual revision. The GlossCap pricing is structured around the captioning workflow rather than per-seat access, which means the economics work at the volume ranges typical for the ICP (50–500-employee organisations producing 20–150 hours of new training video per year), not just at enterprise scale.
FAQ
How long does a realistic back-catalogue retrofit take for a 500-hour training library?
The realistic duration depends on three variables: the Tier-1 gap (what share of the 500 hours carries a live compliance obligation), your effective monthly captioning throughput (vendor delivery speed × your QC review capacity), and the content complexity (a healthcare-LMS library with dense clinical vocabulary takes longer to QC per hour than a standard corporate-onboarding library). For a 500-hour library with a typical Tier-1 gap of 20–30% (100–150 hours subject to ADA Title I, Section 508, or a healthcare-specific obligation), an effective throughput of 50 hours/month, and a standard corporate-training content mix, the Tier-1 retrofit takes two to three months. The Tier-2 and Tier-3 content extends the full-library completion timeline to 9–12 months at that throughput. The audit-trail question is not whether you have completed the retrofit — it is whether you have a documented, on-schedule programme with demonstrated progress. A three-month Tier-1 completion with a nine-month full-library plan and weekly progress updates is the strongest possible audit position.
What is the minimum viable captioning policy for a 50-employee SaaS startup?
A minimum viable captioning policy for a 50-employee organisation can be one page. It needs: (1) the scope (all training video produced or procured by L&D, including Loom clips and async-video onboarding); (2) the technical standard (WCAG 2.1 AA SC 1.2.2, 99%+ accuracy, ±2-second synchrony); (3) the owner (named role, even if it is "the head of People Operations until L&D headcount exceeds 3"); (4) the gate (no new training video published to the LMS or shared as mandatory training without a WCAG-grade caption file); (5) the escalation (any video published without captions is removed or restricted within 3 business days). No signatures beyond the People or L&D lead are strictly required for a 50-person organisation, but having the CEO or COO acknowledge the policy in writing takes ten minutes and substantially strengthens the institutional commitment evidence if an accommodation request ever arrives.
Who should own the caption compliance program — L&D or IT?
L&D should own the programme. The captioning policy governs training content — what is produced, how it is produced, what standard it must meet. Those are L&D decisions, not IT decisions. IT is the right team to own the LMS technical infrastructure, including the captioning vendor's API integration and the access controls on the caption files. But the captioning standard, the QC protocol, the vendor relationship, the glossary architecture, and the audit-trail documentation are L&D responsibilities. The failure mode of IT ownership is that caption compliance becomes an LMS configuration ticket rather than a content governance programme, and no one is accountable for the accuracy of the captions — only for the presence of a caption file. ADA and WCAG compliance requires accuracy, not just presence. The accountability for accuracy has to sit with the function that can measure and improve it, which is the content-producing function.
What does an ADA or OCR compliance audit actually look for?
An OCR (Office for Civil Rights) investigation for an ADA Title I caption complaint typically begins with a request for information covering: (1) the organisation's accessibility policy or statement; (2) documentation of the specific accommodation request or complaint that triggered the investigation; (3) the organisation's response timeline and any interim measures provided; (4) the current caption status of the content cited in the complaint; and (5) a description of the organisation's programme for ensuring new training content is captioned. A systemic complaint (not tied to a specific accommodation request, but alleging a pattern of non-captioned training content) typically results in a Resolution Agreement that requires the organisation to submit a captioning programme plan with milestones, a gap-analysis report, and quarterly progress updates. The five-exhibit audit-response pack described in the audit-trail section above directly satisfies both the individual-complaint and systemic-complaint evidentiary requests.
Do we need to caption archived content that has had zero completions in two years?
The legal answer is that Tier-4 archive content with no active audience carries a lower compliance priority than actively assigned training. The practical answer depends on two factors: first, whether the content is still accessible (if it can be found and played by a user who discovers it through search, it carries an ADA Title III public-accommodation obligation if it is publicly accessible, or a Title I obligation if it is on the internal LMS and any hearing-impaired employee could theoretically access it); and second, whether it is formally deactivated or just informally unused. Content that is formally deactivated — removed from catalogue discovery, not assignable, accessible only through a direct URL that is not distributed — is the most defensible archive disposition. Content that is searchable in the LMS but has not been completed in two years is not formally deactivated; it is just unpopular. The documented disposition of "archive-only content — WCAG compliance remediation deferred pending content lifecycle review" is adequate for an audit trail if combined with a concrete lifecycle review date. The cleanest approach is a quarterly content lifecycle review that retires genuinely obsolete content, removing it from the caption obligation entirely.
What is the right QC pass/fail threshold, and what evidence do we need to document it?
The WCAG 2.1 AA SC 1.2.2 standard does not specify a numerical word-error-rate threshold — it requires that captions be "synchronized with the audio" and "equivalent to the audio content." The 99%+ accuracy floor cited in this post and throughout the GlossCap documentation is the industry-standard interpretation of "equivalent" for training video, derived from the practical analysis that a 1% error rate on a 30-minute video with a 4,000-word audio track produces approximately 40 errors — a level that significantly degrades the learning experience for hearing-impaired learners and is the threshold at which caption errors begin to cause comprehension failures rather than minor inconveniences. To document the QC threshold for the audit trail, your QC protocol document should state: "Caption files are reviewed for compliance with WCAG 2.1 Level AA Success Criterion 1.2.2 using a spot-check procedure and a proper-noun verification protocol. Files are approved when the spot-check produces zero errors in any sampled segment and all proper nouns on the content-category verification list are correctly transcribed. Files failing either criterion are returned to the vendor for revision. The minimum accuracy standard for approval is 99% word accuracy as measured by the spot-check protocol."
How do we handle multilingual training video in the caption programme?
Multilingual training content adds two compliance layers: the language of the captions (must match the language of the audio, not a translation) and the multi-language LMS configuration (most enterprise LMS platforms support multi-language caption tracks, but the upload workflow varies by platform and language code). For a compliance programme, the minimum requirement is that every language track in the active training catalogue has a caption file in the same language — a French-language safety training video requires French captions, not English captions over French audio. The compliance obligation is language-neutral: ADA Title I applies to hearing-impaired employees regardless of the language of their training. For the glossary architecture, each language needs its own glossary — a Spanish-language clinical training library has a different proper-noun register than the English-language version of the same content, and phonetic failure modes in Spanish STT are distinct from English STT failures. The practical programme implication is to treat each language as a separate content category in the asset inventory and the glossary architecture, and to confirm that the captioning vendor supports the specific languages in the catalogue before signing.
What if our LMS does not support sidecar caption file upload?
Every major LMS supports sidecar caption file upload — TalentLMS, Docebo, Absorb, Cornerstone OnDemand, Canvas, Brightspace, Moodle, Workday Learning, HealthStream, Relias. If your LMS does not support it, you are likely using a custom-built or legacy system, or a lightweight content-hosting tool (a SharePoint page with an embedded video, a Notion page with a Loom embed) rather than a true LMS. In those cases, the caption delivery mechanism is a transcript or caption file attached to the page hosting the video, or you need to move the video to a hosting platform that supports sidecar caption upload. Burned-in (open) captions as a workaround are not recommended because they cannot be restyled by users with combined hearing and visual disabilities who need both captions and high-contrast text. If the LMS vendor genuinely does not support sidecar caption files and your organisation has a compliance obligation, this is a tool-replacement conversation — LMS platforms that do not support standard caption delivery are not compliant with Section 508 or WCAG 2.1 AA and expose the organisation to the same compliance risk as the training content itself.
Start the programme with a glossary that works for your training content
The 90-day plan above works with any captioning workflow. Where it specifically benefits from a tool designed for training content is in the glossary architecture — the mechanism that determines whether caption accuracy on your product names, drug names, SDK symbols, and regulatory acronyms is in the 87% range (generic auto-caption) or the 99%+ range (glossary-biased decoding). GlossCap connects to your existing terminology source — Notion, Confluence, Google Docs, or a pasted term list — and applies your company glossary to every caption job. The output is WCAG 2.1 AA-grade caption files that pass the QC protocol on the first review rather than the second or third, and a compliance record that documents accuracy rather than just presence.
The Solo plan at $29/month covers up to 5 hours of video per month — enough for a small L&D team in the retrofit phase and the first months of the new-production workflow. The Team plan at $99/month covers 30 hours per month with Notion/Confluence/Docs glossary sync, the LMS integration, and the QC review interface — the operational toolkit for the programme described above. Start the programme, build the glossary, and the QC load decreases with each batch as the model learns your vocabulary.