Use case · Medical training

Medical training video captions: drug names, procedures, and HIPAA-aware workflow

Medical training video is the highest-stakes content type for caption accuracy in any compliance regime. The proper-noun surface — drug names, procedure names, anatomy, ICD codes, regulatory acronyms — is exactly what general speech-to-text mangles, and exactly what auditors and clinicians both sample. A miscaptioned tirzepatide as "tier zip a tide" is not a comprehension nuisance; it's a clinical and audit-finding risk. Here is what auditors actually check, why general STT can't pass that bar, and the glossary-biased workflow that fixes it.

TL;DR

Medical training video terminology — drug names, procedure names, anatomy, ICD codes — is dense with proper nouns that general speech models have never seen. YouTube auto-captions and vanilla Whisper write phonetic guesses. GlossCap's glossary-biased decoding pulls your formulary or training-deck term list, logit-boosts those tokens into Whisper-large's decoder, and ships SRT/VTT where tirzepatide, empagliflozin, and cholecystectomy land right the first time. HIPAA workflow note: source video stays on your tenant; only audio plus the glossary text is processed; no PHI is required for the captioning pipeline (and shouldn't be present in training content anyway).

The exact words that fail in medical training

Across the medical training video we've audited from L&D leads at health systems, life-sciences companies, and academic medical centres, the failures cluster:

Drug names. tirzepatide → "tier zip a tide" or "tear zep a tide". semaglutide → "see ma glue tide". empagliflozin → "em pag lif lozin" or split into nonsense fragments. apixaban → "a picks a ban". tofacitinib → "toe fa city nib".
Procedure names. cholecystectomy → "co la cyst ectomy". endoscopic retrograde cholangiopancreatography → splintered across multiple lines with internal mis-cuts.
Anatomy. "Hippocampus" usually right; "duodenum" frequently right; less common terms (e.g., vermiform appendix, lateral pterygoid plate) get garbled.
Codes and acronyms. "ICD-10" → "I C D ten" with hyphenation lost; "DSM-5-TR" → "DSM five TR"; "CPT 99213" → "CPT nine nine two one three".
Brand-vs-generic. Ozempic ≈ semaglutide; the speaker says one and the auto-caption sometimes blends or substitutes — both wrong relative to the script.

The failures are concentrated on exactly the words a clinician learning the protocol must see correctly.

What auditors and clinicians both sample

Two different sampling regimes converge on the same surface:

WCAG 2.1 AA / Section 508 audit. An auditor pulls a representative slice of training modules, opens captions on a few sampled segments, and reads. The 99% accuracy threshold is character-level on the standard reading; in practice auditors look for "obvious" failures, and a mis-spelled drug name is the most obvious failure on a clinical training video. See the WCAG 2.1 AA reference.
Clinical learner. A nurse or pharmacist watching the module reads captions to confirm spelling because they will write it on a chart, look it up in the formulary, or quote it on a patient call. A wrong surface form is a downstream clinical error vector.

Both regimes converge on: drug names, procedure names, ICD codes. These are exactly the categories where glossary-aware captioning has the largest delta over general STT.

The glossary-biased workflow for medical content

One-time formulary or training-deck glossary sync. Most health systems and life-sciences L&D teams already maintain a controlled vocabulary or formulary in Confluence, SharePoint, or a Google Docs folder. Connect that source, or paste a flat list of the drug names, procedure names, and acronyms used across your training catalogue.
Upload modules in batches. A module batch processes the audio against Whisper-large with the formulary tokens logit-boosted into the decoder. The output is SRT/VTT/TTML with the proper-noun surface preserved.
Reviewable edit UI. The amber-highlight UI shows every glossary-applied term in context. A subject-matter reviewer (a pharmacist for the drug modules, a clinician for procedure modules) can scrub through and confirm; corrections feed back into the workspace glossary.
Export and attach in your LMS. Most health-system L&D runs Absorb or Cornerstone; pharma L&D runs Docebo; academic medical centres run Kaltura via Canvas. SRT covers Absorb cleanly (see Absorb captions); VTT for the others.

HIPAA workflow notes

Training video that you produce internally for clinical staff is not, by itself, PHI — it's training content, and a well-run training programme deliberately scrubs PHI from the source material in scripting. The captioning pipeline therefore typically processes audio that contains drug names, procedure descriptions, and clinical workflow instruction, with no patient-identifying information. The relevant operational notes:

Source video stays on your tenant. GlossCap pulls a copy for processing; the source remains in your LMS or content store of record.
Glossary content is term lists only. Drug names, procedures, acronyms — never patient identifiers.
If a training video does contain PHI (e.g., a recorded case-review with identifying details), that's a content-governance failure upstream of captioning, and should be remediated at the source rather than at the caption layer.
Business Associate Agreement. If your compliance posture requires a BAA on any system that processes audio derived from clinical operations, talk to us before processing — there are scoping decisions to make about what your specific tenant requires.

Compliance landscape

Health-system and life-sciences training is exposed to multiple overlapping compliance regimes:

ADA Title II — applies to public hospitals and academic medical centres tied to public universities. Deadline already live as of 2026-04-24.
Section 508 — applies to any federal contractor or grant recipient; many academic medical centres carry NIH funding and so are in scope.
Joint Commission and state-level health authority audits — increasingly include accessibility of mandated training content as a sub-bullet under organisational compliance.
EAA — relevant for EU operations and for US health systems with EU-located staff or remote-learning programmes.

The shared denominator across all these is the SC 1.2.2 requirement for synchronized prerecorded captions at high accuracy. The glossary-biased path is the only realistic way to hit that bar on terminology-dense clinical content without per-module manual rework.

See pricing