Platform · Wistia
Wistia captions for training and enablement videos: SRT/VTT upload, Channels, and the B2B compliance posture
Wistia is the video platform of choice for thousands of B2B SaaS marketing and enablement teams: customer-facing product tours, sales-enablement decks, internal onboarding modules, customer-academy course catalogues. Wistia's caption support is straightforward — SRT and DFXP accepted, in-player CC button, auto-track via the auto-caption feature. The gap is the same gap every general ASR has: product names, competitor names, and SDK terminology come out wrong, and those are exactly the words B2B training video lives or dies on. Here's the upload flow, what Channels does to caption display, and the glossary-biased workflow that fits a B2B enablement catalogue.
TL;DR
Wistia accepts caption uploads at the media level under Captions in the media-edit panel. The supported formats are SRT (SubRip) and DFXP/TTML; SRT is the everyday upload format. The auto-generated track is fast, free, and fails on the proper-noun surface that B2B enablement video is built around: product names (yours and competitors'), SDK and API names, methodology acronyms. For a SaaS company hosting a customer-facing enablement library on Wistia Channels, an Title-III public-accommodation-exposed catalogue, or a federal-contractor flow-down obligation, the glossary-biased workflow ships SRT that uploads to Wistia without format quirks and lands the proper nouns right on first export. $99/mo Team plan covers 30 hours/month — see our pricing breakdown for the per-hour math.
The Wistia caption upload flow
- Open the media. In your Wistia project, open the video. Click Customize in the top-right.
- Open the captions panel. In the customize sidebar, click Captions. You'll see whichever tracks already exist (auto-captions, prior uploads).
- Upload the file. Click Add Captions File, choose the language, and upload the SRT or DFXP file. Wistia parses the cues and shows them in the in-player CC menu.
- Set as the active track. If you uploaded an SRT alongside an existing auto-track, you can mark the upload as the primary display. Wistia respects the viewer's CC preference either way.
- Verify on the public player. Open the Wistia URL or your embed in a fresh browser session. The CC button surfaces the new track. Catalogue-ready.
For catalogues larger than a couple dozen videos, Wistia's Data API exposes the /captions endpoint for programmatic upload. Bulk-retrofitting a back catalogue is a script away.
Wistia Channels and how captions display
Wistia Channels is the customer-academy / video-portal product on top of standard Wistia hosting — a branded, embeddable hub where multiple videos live behind a navigation surface. Caption behaviour in Channels mirrors single-media playback:
- Per-media caption tracks. Each video in the Channel uses its own SRT/DFXP tracks; there's no Channel-level caption setting.
- CC button surfaces all tracks. Multi-language tracks show up as separate options in the player menu.
- Search. Channel search indexes the caption text — which means a captioning workflow that mangles product names also mangles search. A user typing your product name into Channel search misses videos where the auto-track wrote it differently.
That last point is operator-level: the search behaviour means caption accuracy isn't only a compliance concern, it's a discoverability concern for the customer-academy use case.
Why Wistia's auto-captions fail on B2B SaaS content
Wistia's auto-caption feature uses general ASR. It scores 88-92% on average across the kind of audio you'd find in a customer-facing enablement video: a single English-speaking presenter, clean recording, business vocabulary. Failures cluster on the words that determine whether the video is useful at all:
- Your product names. Codenames, SKUs, acronyms. "DataDog" → "data dog"; "PagerDuty" → "pager duty"; "GitHub Actions" → "git hub actions". Capitalisation lost; tokenisation random.
- Competitor names. The names that go into competitive enablement decks. "Looker" → "look er"; "Mode Analytics" usually right; "Sigma" → "sigma" — but "Sigma Computing" routinely splits. Comparison tables narrated on-camera lose half their punch when the captions can't keep up.
- SDK and library names. Engineering and developer-relations content. numpy → "numb pie" or "num pie"; tensorflow → "tensor flow" with capitalisation lost; k8s → "K eight S".
- Methodology and framework names. "MEDDIC" → "med dick" or "med ick"; "MEDDPICC" → splintered fragments; "BANT" → "bant" sometimes right; "Challenger Sale" → "challenger sale" usually right but sometimes "cell".
- Customer names in case-study video. "Cloudflare" usually right; "Datadog" lost; less-mainstream customer names usually lost.
- Person names. Founder, CEO, named instructors — same proper-noun failure as elsewhere.
The pattern: connective-tissue words come out fine, the proper-noun surface that determines whether a sales-enablement deck is intelligible (or whether a customer-academy module gets indexed correctly) routinely fails.
The B2B SaaS compliance posture
SaaS companies hosting training video on Wistia sit inside a stack of compliance pressures:
- Title III public-accommodation reading for any customer-facing learning content. The pattern in Robles v. Domino's and successor cases: courts read commercial websites as places of public accommodation under Title III; WCAG 2.0 AA / 2.1 AA is the operative technical standard; private suit is the typical enforcement.
- Section 508 contractor flow-down for SaaS sold to U.S. federal agencies. Caption adequacy on training video tied to a federal contract is a procurement-acceptance criterion.
- Customer accessibility-statement contractual asks. Larger SaaS customers — especially regulated industries — increasingly require their vendors to attest to accessibility standards in MSAs and DPAs. The captioning posture on the customer-academy and sales-enablement library is part of what your security and trust team has to answer.
- EAA for any EU-customer-facing training content.
The clean B2B SaaS posture is: glossary-biased captions on the catalogue, SRT uploaded to Wistia at the media level, an accessibility statement on the customer portal that names the operative standard and the contact for caption-accuracy issues. Achievable at the Team tier ($99/mo for 30 hours/month of new captioning).
The glossary-biased workflow for a Wistia enablement catalogue
- Build the SaaS glossary. Five lists: your product names and codenames; your competitor names; SDK / API / framework names; methodology acronyms; named instructors and executives. 40-80 terms covers most B2B SaaS catalogues.
- Process new uploads as part of the publish flow. Most enablement teams have a recurring rhythm — record, edit, publish to Wistia, post to enablement portal. Insert GlossCap between edit and publish. Output is SRT.
- Reviewer pass. Your enablement producer (the person who knows the deck) scrubs the amber-highlight UI for any term-application question.
- Upload to Wistia via Customize → Captions, set as active track. For Channels, the same upload propagates to the Channel display.
- Back catalogue. Wistia Data API
/captionsendpoint accepts SRT uploads programmatically. Bulk-retrofit a hundred customer-academy modules over a weekend.
Wistia's first-party caption options — who they fit
- Auto-captions. Free, fast, ~88-92% accurate; fails on proper nouns. Right for low-stakes content where the connective tissue is the message and the named entities don't matter much. Wrong for product training, sales enablement, or customer academy where the named entities are the message.
- Wistia Subtitles add-on / vendor passthrough. Wistia partners with human-transcription vendors (Rev, 3Play, etc.) and surfaces them in the upload flow. Quality is human-reviewed; cost lands in the $1-2/minute range — three to four times the per-hour cost of glossary-biased captioning. Right for the small fraction of content that needs human review specifically; expensive for a recurring weekly enablement catalogue.
- Manual upload (BYO captions). Free; the workflow this page describes. Right for any team that wants source-of-truth control over the caption track.
Related questions
What caption formats does Wistia accept?
SRT (SubRip) and DFXP/TTML. SRT is the everyday upload format and what most upstream tools (GlossCap included) export by default. DFXP/TTML is useful for downstream LMS handoff if your catalogue also lives in TalentLMS or a similar system.
How does this compare to Vimeo for B2B training?
Vimeo accepts more formats (SRT/VTT/SBV/SCC/DFXP) and has a slightly cleaner per-video upload UI; Wistia has tighter analytics integration and the Channels portal product. The captioning workflow is broadly similar. See Vimeo captions for training videos for the side-by-side.
Does Wistia support multi-language captions?
Yes — multiple SRT/DFXP tracks per media, each labelled by language. Viewers pick from the CC menu. For multi-language enablement catalogues, the workflow is: caption the source-language audio with GlossCap, translate the SRT (post-MT or human review), upload each language as a separate Wistia track.
What about Wistia embeds — do captions follow?
Yes. Wistia's embed player (Standard, Popover, Inline) surfaces the same caption tracks as Wistia-hosted playback. The CC button works identically on embeds; there's no extra configuration on the embedding site.
Will GlossCap have a direct Wistia integration?
Roadmap. The current flow is export SRT from GlossCap, upload to Wistia via the Customize panel or the Data API. A direct one-click integration is on the post-launch backlog — talk to us if you'd be the reference customer.