LMS integration

Docebo captions integration: VTT, language tracks, and the Central Repository workflow

Docebo is the enterprise LMS most common in 500+-employee learning orgs — a segment GlossCap supports via the Org tier. Docebo's Central Repository stores video assets with caption tracks hanging off them; once you grok that structure, caption retrofit is straightforward. This is the upload flow, the language-track handling, and why VTT is usually the format to hand Docebo, not SRT.

TL;DR

Docebo's video lives in the Central Repository (sometimes called Central Content Repository, CCR). A video asset can carry multiple caption tracks, one per language, and those tracks are bound to the asset — not to the course that uses the asset. Reuse an asset across courses, the captions follow. VTT is the well-tested format for Docebo; SRT usually works too but the player's track-switching behaviour is cleaner on VTT. The compliance story is the same as every other LMS: the caption content — verbatim dialogue, speaker labels, non-speech cues, ≈99% accuracy — is what carries the audit, not the platform.

Where captions live in Docebo's data model

Most LMS-retrofit work goes wrong because admins look for captions where courses live (instructional design view) instead of where assets live (content management view). Docebo is explicit about this: the Central Repository is the asset store. Each video asset in the repository has metadata (title, language, description) and can have caption tracks attached as part of that metadata.

Practically, this means:

Upload the caption track at the asset level, not the course level. Open the video in the Central Repository, edit metadata, attach caption files.
Caption tracks persist across course reuse. If one video asset is used in three different courses, you caption it once; all three courses render the captions.
One caption track per language per asset. English, French, German — each a separate upload, each with a language tag.
Replacing the video replaces the asset. If you re-upload the source video to fix a pacing issue, you may need to re-attach captions depending on whether the platform treats it as a new version.

Why VTT over SRT for Docebo

Both formats work in modern Docebo video players. The reason VTT is the safer default:

Native HTML5 track element. Docebo's player consumes VTT via the <track> element, which is the spec-preferred path. SRT gets converted internally, and that conversion is where occasional edge-case bugs surface (extra blank lines merged, trailing-hyphen continuations mis-parsed).
Styling hooks available. VTT supports ::cue CSS styling, voice tags (<v Alex>), and inline italics/bold. If your branding team wants a consistent caption look across courses, VTT is the format that lets you express it.
Better multi-language switching. Docebo's player labels caption tracks by the language metadata in the upload — VTT's srclang attribute in the <track> element travels cleanly; SRT relies on the upload form's language-dropdown value, which is one extra step of operator error.
Non-speech sound cues are stylable. VTT's inline tags let [laughter] render in italics so hearing users distinguish them from dialogue. SRT has no typography.

SRT is still acceptable if your team has an SRT-based authoring workflow. But if you are asking "which should I export from GlossCap for Docebo", the answer is VTT.

The retrofit workflow for a Docebo library

Retrofitting ADA Title II compliance (deadline live as of 2026-04-24) or EAA scope (EAA) for an enterprise Docebo library typically looks like:

Identify every video asset in the Central Repository. Admin can filter assets by type = video. Export the list with asset IDs, titles, durations.
Decide caption scope. Not every asset needs captions; onboarding and compliance modules do, but an ops-only demo asset used in a single internal course may not. The scope call is organizational, not technical.
Upload source videos to a GlossCap batch. Attach your company glossary once (Notion / Confluence / Google Docs sync, or paste list). For enterprise libraries, glossary depth matters — every product name, every internal acronym, every SDK symbol should be in it.
Export VTTs with filenames tied to asset IDs. GlossCap supports arbitrary filename conventions on bulk export so the subsequent upload is a drag-and-drop match.
Attach per asset in Docebo. Central Repository → asset → captions → upload. With 100+ assets, this is best done as a focused admin sprint.
Verify on one test course per domain. Docebo's multi-domain (branding / audience isolation) feature means a test-learner account in one domain may not see what a learner in another will; sample at least one learner account per customer-facing domain.

The two things teams underestimate

Asset reuse is a win, not a footgun. If a "manager onboarding" video asset is reused across five courses in five branded domains, captioning once covers all five. Count your assets, not your courses — the scope is smaller than it looks. GlossCap's per-asset run means you pay once for the compute, not five times.

Speaker attribution matters more on enterprise training. Docebo's large-course-library use case tends to include multi-speaker panels, instructor-plus-guest formats, and cross-functional walkthroughs. The auditor sampling question shifts from "is it accurate" to "can I tell who is speaking at every change". GlossCap emits speaker labels using <v Name> voice tags in VTT output, which Docebo's player renders as a prefix on each cue.

How GlossCap fits Docebo specifically

The core loop is the one on the homepage — captions that know your jargon — but the Docebo-specific details matter:

VTT export with WEBVTT header, period-separated timecodes, and <v SpeakerName> voice tags where speakers are identifiable.
Language codes emitted as the source-language tag by default; enterprise tier supports custom output language codes (useful if you are tagging as a locale-specific variant like en-GB).
Filename conventions configurable to match your Docebo asset-ID scheme, so the upload-matching step is drag-and-drop.
Non-speech sound cues formatted as [alarm], [laughter], [background music] — bracketed, italicized via inline VTT styling, which Docebo's player renders distinctly from dialogue.

All of which amounts to: the caption file you download from GlossCap is the caption file you upload to Docebo, with no manual fixup between. The terminology preservation — kubectl, Docebo (yes, even its own product name), tirzepatide — comes from the glossary-biased decode, not from a post-hoc find-and-replace.

See pricing