Tool reference · Loom (async video)

Loom captions: glossary-biased SRT/VTT for async-video training in modern SaaS

Loom is the async-video default for the modern SaaS workplace — the tool the engineering manager uses for the "here's the architecture review I would have done in your meeting" video, the tool the customer-success rep uses for the "here's how to configure your account" walkthrough, the tool the product-manager uses for the "here's how the new feature works" launch demo, and increasingly the tool the L&D team uses for the lightweight customer-academy module. Loom's defining bet is async-first, browser-first, no-edit-friction video; everything else (auto-transcript, SRT export, AI summaries, embeddable links) hangs off the recording. The captioning question on Loom is the same question we ask on every authoring surface: not whether captions are technically supported (they are, via auto-transcript and SRT export) but whether the captions preserve the vocabulary the recording was made to teach. The answer, on the kind of content Loom is used for, is consistently no — until you bring a glossary-biased upstream pass to it.

TL;DR

Loom auto-transcripts every recording and exposes the transcript as searchable text plus a downloadable SRT (Business and Enterprise tiers, with limits on Starter). The auto-transcript is generic ASR and mangles product names, SDK terms, customer identifiers, and internal acronyms — exactly the vocabulary modern SaaS recordings are dense with. For training video that ships into a customer academy, an L&D module, or a public knowledge base, the auto-transcript is unfit for purpose. Glossary-biased captioning with the customer's product catalogue, SDK reference, internal acronym register, and customer-name register as the project glossary produces a clean SRT. The SRT can replace Loom's auto-caption (uploaded via the Loom UI on Business/Enterprise) or accompany the Loom MP4 download on its way to a hosted destination (Vimeo, Wistia, the LMS).

What Loom is, and where in the workflow captioning lands

Loom (acquired by Atlassian in 2023, now an Atlassian Cloud product) is an async-video recorder-and-host with a browser extension, a desktop app, and mobile clients. Distinguishing characteristics:

Captioning lands at one of three points in the workflow: (1) the Loom auto-transcript, served alongside the recording natively; (2) a manually uploaded SRT replacing the auto-transcript (Business / Enterprise tier feature); (3) the SRT exported alongside the MP4 download for use at a downstream destination (LMS, video host, knowledge base).

The Loom caption-export and -upload mechanics

The vocabulary surface on Loom recordings in modern SaaS

Loom recordings in a modern SaaS workspace concentrate the highest proper-noun density of any video surface we measure. Why: Loom is the async tool of choice for context-rich, narrow-topic, single-author content — exactly the content profile that pulls heavily on internal vocabulary.

Why Loom's auto-transcript fails on this content

Loom's auto-transcript engine is generic ASR — well-tuned, but generic. It has no access to your product catalogue, your SDK reference, your customer name register, your internal acronym register, your sales-framework register. The proper-noun mangling we measure on a representative SaaS Loom recording averages 11-18 mangles per minute of speech for engineering-team content, 6-10 mangles per minute for customer-success enablement, and 8-14 mangles per minute for product-marketing enablement. The mangle pattern is deterministic per term — the same product name mangles the same way across recordings — so a hand-correction workflow at the team level is, predictably, a half-FTE problem. Our long-form post on the hidden half-FTE in L&D caption correction walks the full math; on Loom-heavy SaaS workspaces, the Loom slice is the largest chunk of that half-FTE.

Loom AI's downstream features — auto-titles, chapters, summaries — inherit the mangle. The "AI summary" of a sales-enablement Loom that calls every customer name and competitor name by a hallucinated phonetic neighbour is an obvious problem; the customer-success enablement summary that mangles every product feature name is the same problem with subtler downstream consequences (the rep watches the summary, not the recording, and learns the wrong terms).

The glossary-biased workflow upstream of Loom

  1. Pull the customer's controlled vocabulary. SaaS-specific surface: the product-feature catalogue (a CSV from the product team, or the feature-flag table, or the public marketing-site feature index), the SDK reference (TypeDoc / JSDoc / pdoc / swagger output), the integration partner register, the persona / customer / account-name register (with caution — see privacy section), the internal acronym register from the company wiki.
  2. Download the Loom MP4. Business / Enterprise tier supports MP4 download from the recording's options menu. For workspace-scale retrofit, the Loom Workspace API supports programmatic MP4 download for the Workspace owner.
  3. Caption the MP4 with glossary-biased decoding. Run the audio through the captioning workflow with the workspace glossary biasing the decoder. Output: a clean SRT.
  4. Reviewer pass with amber-highlight UI. Every glossary-applied term highlighted with source-line provenance (feature catalogue entry, SDK reference URL, persona-register entry). The author or a peer reviewer scrubs the SRT; corrections feed the workspace glossary.
  5. Replace Loom's auto-caption. Business / Enterprise: upload the SRT to the recording, replacing the auto-caption. The Loom player now serves the clean caption track. The Loom auto-transcript is replaced for searchability, AI features, and CC display.
  6. Document. For training Looms that ship to a customer academy or external knowledge base, log the captioning provenance (vendor + glossary version, reviewer, date) for audit-evidence purposes.

See pricing

The privacy and compliance posture on customer-name handling

Loom recordings in customer-success enablement, sales coaching, and account-review contexts mention customer names. The captioning workflow has to handle this carefully:

For most internal-training and customer-education Loom content (no customer names, no PHI, no PII beyond the speaker's own identity), the standard captioning workflow is appropriate. For higher-sensitivity content, the workflow design is the audit-relevant question.

Where Loom-captioned content typically lands

How Loom captions intersect WCAG 2.1 AA, ADA Title II, and EAA

Loom recordings shipped to a public-facing destination (customer academy, public knowledge base, marketing-site explainer) inherit the destination's accessibility regime. Loom recordings shipped to internal SaaS-employee consumption inherit the employer's internal accessibility posture, which under Section 504 for federal-financial-assistance recipients and indirect ADA Title III posture for private-sector employers, increasingly converges on WCAG 2.1 AA.

Loom recordings consumed inside an EU-operating organisation fall under the European Accessibility Act for B2C-facing content; B2B is currently exempt under Article 4(5)'s microenterprise carve-out for very small services-only providers, but the safer baseline is to caption all training video to WCAG 2.1 AA regardless of regime. See our EAA captions requirements reference and our EAA Q3 2026 inflection-point post.

The technical caption requirement at SC 1.2.2 (Captions, Prerecorded) is the relevant WCAG checkpoint. SC 1.2.4 (Captions, Live) does not apply to Loom — Loom is async only.

Related questions

Does the Loom auto-transcript pass WCAG 2.1 AA on its own?

Technically the auto-transcript provides a caption track and clears the SC 1.2.2 surface check. Substantively, "captions accurately convey what the speaker said" is the audit-relevant standard, and the auto-transcript's per-minute proper-noun mangle rate on dense SaaS content does not meet that standard. For content shipping to a public-facing destination or a high-stakes internal training surface, the auto-transcript is unfit for purpose; for low-stakes ad-hoc team-internal Looms, it's typically adequate.

What about Loom AI's "AI summary" feature — does the glossary-biased workflow improve that?

Yes, indirectly. Loom AI consumes the transcript; if the transcript is clean, the AI features are clean. Replacing the auto-transcript with a glossary-biased SRT improves the auto-titles, chapters, and AI summaries downstream. (At the time of writing Loom regenerates AI features when the transcript changes — verify on your Loom workspace.)

Can I upload an SRT to a Loom recording on the Starter tier?

No — caption upload (replace transcript) is a Business / Enterprise feature. Starter-tier Loom users wanting clean captions need to either upgrade or use the MP4-download-plus-sidecar pattern at the downstream host. Most SaaS organisations using Loom for training are on Business or Enterprise.

How does this differ from captioning Camtasia, Storyline, or Rise content?

The vocabulary surface is similar across all four (modern SaaS proper-noun density), but the captioning insertion point differs. Camtasia is timeline-level inside the Camtasia editor. Storyline is per-slide inside the source file. Rise is per-Video-block inside the Rise course. Loom is per-recording at the Loom hosted service. The upstream glossary-biased workflow is identical; the downstream import target changes per tool.

What's the workspace-scale back-catalogue retrofit pattern on Loom?

Loom's Workspace API permits programmatic MP4 download for Workspace owners. The pattern: enumerate the Workspace's recordings, download MP4 for each, batch-caption with the workspace glossary, replace each recording's caption track via the Loom API (Enterprise feature). For Workspaces with hundreds of training-relevant Looms, this is the practical path; for smaller Workspaces, manual per-recording handling is fine.

Further reading