Launch / 2026-04-24

Captions that know your jargon.

ADA Title II just became enforceable. WCAG 2.1 AA demands 99%+ caption accuracy. YouTube's auto-captions are 80–90% and they mangle the exact words that matter most in training video — product names, SDK symbols, drug names, proper nouns. Here's what L&D teams are doing about it, and what GlossCap actually ships.

The deadline your training team is feeling this week

On 2026-04-24, ADA Title II's web-content rule became enforceable for state and local government entities — which includes every public university, every community college, every county health department, every municipal training department. The European Accessibility Act has been enforceable since June 2025 and is still catching non-compliant SMBs with quiet-period audits.

Both standards point at the same operational reality: if your employees, students, or the public watch a training video that your organization produced, that video needs synchronized captions at roughly 99% accuracy. YouTube's auto-caption track is not compliant. Neither is Zoom's live transcript. Neither is the .vtt file your LMS generated off the raw audio. An auditor asks for proof the track was reviewed, and "we uploaded it to YouTube and accepted the auto-captions" is not proof.

And L&D teams already knew this was coming. The question that's now sitting in enablement leads' inboxes isn't whether to fix captions — it's how.

YouTube's guesses are where the work is

If you've ever watched a general-purpose speech model try to caption a real internal training video, you've seen this:

Every mangled term is a learner who doesn't know what the trainer just said. Every one breaks WCAG 2.1 AA accuracy. And every one shows up in the auditor's spot-check because the exact words that matter most in training — your product names, your API surface, your clinical vocabulary, your acronyms — are exactly the words general-purpose models are worst at. They're rare in the model's training distribution, and they're high-information in yours.

The fix today is a person and a text editor

The current workflow in most 50–500-employee L&D teams looks like this: upload to YouTube (or Kaltura, or your LMS) → wait for the machine transcript → download the .vtt → open it in VS Code or a caption editor → hand-correct every mangled term → re-upload → spot-check the sync → file.

Operators I talked to while researching this: 1 to 2 hours per finished video. On a typical enablement team producing 30 hours of new training content per month — onboarding, product release training, compliance refreshers, recorded all-hands — that's 30 to 60 hours a month of caption-cleanup work. A half-FTE hidden inside the captioning step. And once you see the math, you stop signing off on it.

"We did this internally for Q1. It ate a week of our ops lead's calendar. I am not doing it again."

— enablement director, 180-person SaaS, during customer research

Why the existing accuracy-grade vendors don't fit

Verbit, Rev.com, and 3Play Media are the existing accuracy-grade captioning options. They produce compliant output. They also start at $200+/month with SOC-2 questionnaires and procurement cycles, and their GTM targets 1000-seat-and-up organizations — the enterprise universities, the global consulting firms, the regulated financial services.

If you run L&D for a 50–500-person company, you are not that buyer. Your "captioning budget" this quarter is an intern with Adobe Premiere Pro and a deadline. You read the ADA headline on Monday, found out your backlog of Q1 training videos is non-compliant by Tuesday, and by Wednesday you were pricing vendors whose sales motion is built around an 8-week evaluation.

That's the gap GlossCap closes. Compliance-grade output, self-serve, priced for the team that doesn't have a procurement team.

What GlossCap actually does

The product is three steps.

  1. Link your glossary. Point GlossCap at a Notion page, a Confluence space, or a Google Docs folder. On the Solo plan, paste terms into a text box. We read your product names, SDK symbols, acronyms, drug names, course codes — whatever you care about being spelled right.
  2. Upload your video. We run it through a glossary-biased speech model. The bias operates on the BPE token level: every term in your glossary gets a logit boost when the acoustic signal is phonetically close. The model still transcribes the rest of the audio the way a general model would — it only bends toward your vocabulary when there's evidence of your vocabulary in the signal. No hallucination, no inserting terms that weren't said.
  3. Export to your LMS. SRT and VTT out of the box, webhook delivery to TalentLMS, Docebo, Absorb, Kaltura, and YouTube on Team and Org plans. Every track also passes through a human-review UI so you can sign off on the final version before export — which is what WCAG 2.1 AA and ADA Title II auditors actually want to see documented.

The proof, in two lines

Same audio. Same underlying Whisper-large model. Different prompt.

Generic:  open cube control and apply the helmet chart
GlossCap: open kubectl and apply the Helm chart

The glossary is not a feature. The glossary IS the product. Every hour of video you caption with GlossCap strengthens the term model for your specific organization — so the second month is more accurate than the first, and the sixth month is more accurate than the second. It's a switching cost competitors without ingestion don't have. The longer you stay, the more your captions reflect how your team actually talks.

Pricing tracks the job, not the seat count

No quote. No call. Card. You can cancel in the billing page. If the hour-count math doesn't work for you, email me at hello@glosscap.com — I read every one and I'll tell you honestly whether it's worth it.

What v1 ships, and what it doesn't

v1 ships: Whisper-large transcription with glossary-biased decoding; SRT and VTT export; human-review edit UI; Notion / Confluence / Google Docs glossary ingestion; waitlist for Stripe billing the week the first real customer hits "pay." No vendor lock-in on your glossary source — paste, sync, or export anytime.

v1 does not ship: speaker diarization as a first-class feature (it's there, it's not the focus); live real-time captioning (async only); a custom in-browser video editor beyond what you need to sign off on caption timing; enterprise SSO (that's the Org plan — if you need it sooner, email me). We are deliberately scoping v1 around the single highest-pain workflow: "here's an async MP4, give me a compliant caption file, fast, and get my jargon right." Everything else waits until the first cohort tells me what they actually need.

Early access — 100 seats, price-locked Team at $79/month for year one

If you run L&D for a 50–500-employee org and the deadline just became your problem, get on the waitlist. First 100 Team-plan seats are price-locked at $79/month for 12 months. After that the list closes and the rate is the posted $99/month. No waitlist spam, one email when your slot opens.

Back to glosscap.com · Read more about how it works or see all pricing.