Use case · Engineering onboarding

Engineering onboarding video captions: SDK names, kubectl, and pytorch preserved

Engineering onboarding video is the worst-case content type for general-purpose speech-to-text. The exact words a new hire most needs to hear right — your internal SDK names, the CLI flags they're going to type at 9am tomorrow, the framework APIs they'll grep for, the repo and service names they'll mention in standup — are the words a model that hasn't seen your codebase mangles. Captioning these videos with YouTube auto-captions, Otter, or vanilla Whisper produces a learning artifact that is technically WCAG 2.1 AA compliant on accuracy floor and totally useless on the comprehension surface that matters. Here is why, and here is the glossary-biased workflow that fixes it.

TL;DR

Engineering onboarding video is dense with proper-noun terminology that general speech models have never seen: kubectl, pytorch, helm, terraform, your internal service names, your monorepo paths, your SDK class names. Auto-captions write "cube control", "pie torch", "hell-em" — technically synchronized, completely wrong. GlossCap's glossary-biased decoding pulls your team's term list from Notion / Confluence / Google Docs, logit-boosts those tokens into Whisper-large's decoder, and ships SRT/VTT where the surface form lands right the first time. WCAG 2.1 AA on accuracy floor; comprehension on the tokens that matter.

The exact words that fail

Across hundreds of hours of engineering onboarding video that we've audited from training-ops teams, the failures cluster into recognisable categories:

CLI tools. kubectl → "cube control" or "cube cuddle". helm → "hell em" or "helm" but capitalized as a sentence start. terraform → split as "terra form". aws-cli → "AWS see lie".
Frameworks and libraries. pytorch → "pie torch". tensorflow → "tensor flow" (two words). nestjs → "Nest JS" or "nest yes". fastapi → "fast API".
Internal proper nouns. Your service names ("Atlas", "Beacon", "Compass") get title-cased correctly; your repo names ("frontend-monolith") get dropped or split; your team's coined verbs ("we tubelite the request") become nonsense.
Acronyms. k8s → "Kate's" or "K eights". gRPC → "G RPC" or "gerp see". OAuth → "Oh auth" or "owe auth".
Numbers attached to versions. "Python 3.12" → "Python three point one two". "v18.2.0" → "vee one eighteen two zero".

None of these failures ring as "wrong" to the auto-caption system. The acoustic model heard a thing; it picked the most likely token sequence in its priors; the priors didn't include your codebase. So the file is timing-correct, character-aligned, and content-wrong on exactly the surface form a new hire will type tomorrow.

Why this matters more than for general training video

Engineering onboarding video has a specific failure mode beyond accessibility. The deaf-or-hard-of-hearing engineer using captions to learn the codebase is also the engineer who needs to copy the spelling of the tool. A miscaptioned kubectl doesn't just fail comprehension — it actively misleads. They open a terminal and type "cube cuddle" because that's what the captions said.

The hearing engineer turning on captions because they're in a noisy office or focused on the screen has a similar dependency: they're scanning the captions for the tool name to grep the docs while the speaker keeps moving. A wrong surface form breaks the search.

So the WCAG 2.1 AA threshold (99% on the standard reading) is necessary but not sufficient. You need accuracy on the *terminology surface* — the proper-noun and identifier subset of the transcript — and that's a higher bar than character-level accuracy on the whole.

The glossary-biased workflow

GlossCap is built around the moat that beat us when we were a small team trying to caption our own onboarding video: the glossary is the model. Workflow:

One-time glossary sync. Connect a Notion page, Confluence space, or Google Docs folder containing your team's term list. Or paste a flat list. Common starting set: kubectl, your internal service names, your SDK class names, the framework versions you're standardised on, your team's coined verbs.
Upload the onboarding videos in a batch. The whole onboarding playlist as one batch — that's how the glossary model learns from cross-reference within the batch and gets even better on the second pass.
Glossary-biased decoding. Whisper-large transcribes; before any output token is sampled, the decoder's logits are boosted toward the glossary tokens by a tunable bias factor. kubectl beats "cube control" because the decoder has been told kubectl is in the lexicon.
Reviewable edit UI. The output is shown next to the audio waveform with the bias-boosted terms highlighted in amber. A reviewer can scrub through and correct the small remainder; the corrections feed back into the per-customer glossary model.
Export to your LMS. SRT for nearly anything; VTT for HTML5 and Kaltura/Docebo; TTML for Kaltura DFXP and broadcast pipelines. See our SRT, VTT, and TTML pages for format-specific notes.

Demo: the same minute, three caption sources

Synthesised from a representative engineering onboarding clip ("This week we're going to walk through deploying a service to the staging cluster using kubectl apply and a helm chart, then verify it on Beacon"):

YouTube auto-caption: "This week we're going to walk through deploying a service to the staging cluster using cube control apply and a hell em chart then verify it on beacon."
Vanilla Whisper-large: "This week we're going to walk through deploying a service to the staging cluster using kubectl apply and a helm chart then verify it on beacon." — gets kubectl and helm right because they're in the public training corpus, but lowercases "Beacon" because it doesn't know it's your service.
GlossCap (with glossary): "This week we're going to walk through deploying a service to the staging cluster using kubectl apply and a helm chart, then verify it on Beacon." — proper-noun "Beacon" preserved because the glossary marked it as a service name.

The vanilla Whisper performance is surprisingly good on the high-frequency open-source tools — but it falls down on the internal service names that are exactly the part a new hire most needs to learn.

Compliance side: WCAG 2.1 AA still applies

Even before the comprehension argument, engineering onboarding video falls under the same compliance bar as any other prerecorded internal training: WCAG SC 1.2.2 requires synchronized captions; the bar in practice is the 99% accuracy threshold described on the WCAG 2.1 AA captions reference. If the org is also subject to ADA Title II (state/local government, public universities, public-funded entities) the deadline already hit on 2026-04-24. Glossary-aware captioning gets you the accuracy floor and the comprehension surface in the same export.

See pricing