LMS integration

Kaltura captions: REST upload, multi-language tracks, and the audit-ready workflow

Kaltura sits underneath a huge fraction of higher-ed lecture capture and large-enterprise video — Canvas/Blackboard/Moodle integrations, MediaSpace portals, embedded enterprise players. Caption attachment in Kaltura is more flexible than in any other major LMS-adjacent platform: there's a UI path through KMC, a REST Caption API for automation, and a per-entry track model that natively supports multiple languages. Here is the upload flow that actually works, the moves that break audit posture, and the retrofit playbook for a Kaltura library hit by the ADA Title II 2026-04-24 deadline.

TL;DR

In Kaltura, captions are caption assets attached to a video entry. Upload via the Kaltura Management Console (KMC) — Entry → Captions tab — or via the REST Caption API (caption_captionasset.add + caption_captionasset.setContent). Kaltura accepts SRT, VTT, DFXP/TTML, and SCC. Multi-language tracks attach to the same entry. For WCAG 2.1 AA the content of the caption file is the load-bearing piece — the player passes; your terminology accuracy is what an auditor samples.

Where caption upload lives in Kaltura

Two production paths, picked by scale:

KMC UI (one-off, small libraries). Sign in to KMC → Content → Entries → click the entry → Captions tab → "Upload Captions". Pick the language, select Yes for "Default" if this is the primary track, choose the caption format (SRT / VTT / DFXP / SCC), upload the file. Kaltura immediately makes the track visible on the player.
Kaltura REST Caption API (scale, automation). The two-call dance is caption_captionasset.add (declares the asset metadata: entryId, language, label, format, isDefault) followed by caption_captionasset.setContent (uploads the actual file content as a token-uploaded resource). Both calls are documented in the Kaltura API Console and ship in the official Node, Python, and PHP client libraries.

For libraries above ~50 entries, the REST path wins by an order of magnitude on time-to-finish — the UI path requires a click-pivot per entry, the API path is one batch run.

The multi-language track model

Kaltura's caption-asset model is one of the cleaner ones in the LMS-adjacent space: each video entry holds a list of caption assets, each with its own ISO language code, label, and isDefault flag. The Kaltura player auto-renders the CC menu from this list. Adding French + Spanish + English to the same training module is three caption-asset uploads against the same entry, each with the right language code (en, fr, es) and at most one with isDefault = true.

A common mistake: setting isDefault = true on multiple caption assets for the same entry. The player picks the last one written, which makes the active default non-deterministic across re-imports. Pick exactly one default per entry per language family.

The retrofit workflow for a Kaltura library

List the entries. Use media.list filtered by category, owner, or createdAt window. Pull entryId, name, duration, status, and current captionAssetCount into a working spreadsheet so the post-retrofit verification step has a baseline.
Classify by audit risk. Public-facing higher-ed lecture capture (covered by ADA Title II for state/local public entities since 2026-04-24), customer-facing product training, and HR-mandated compliance training all need captions first; archived or admin-only content can wait.
Caption with one glossary in GlossCap. Pull the source video files from Kaltura (or your upstream archive), drop them into a GlossCap batch, sync the company glossary once. SDK names, drug names, regulatory acronyms, internal product names — all logit-boosted into the Whisper-large decoder before output is generated.
Export with entry-id-tagged filenames. Configure the GlossCap batch export so each SRT or VTT file is named with the Kaltura entryId, e.g. 1_abc123def.srt. The next step becomes a one-line script.
Bulk-attach via REST. A short script iterates the SRT files, calls caption_captionasset.add + caption_captionasset.setContent per entry. With the entryId baked into the filename and a stable language label, the script is well under 100 lines in any of the official client libraries.
Verify on a sample player URL. Open 5-10 sampled entries in the public player or your MediaSpace portal, confirm the CC button shows the right languages and that the timings look right. This catches mis-mapping where wrong filename → wrong entry.

Higher-ed and the ADA Title II reality

Public universities are disproportionately hit by the 2026-04-24 ADA Title II deadline — it covers state and local government entities, and most flagship-university Kaltura libraries fall squarely in scope. Years of accumulated lecture capture, in many cases with no caption track or with auto-captions of ~85% accuracy, do not pass an audit. The university lecture capture page walks through what an auditor specifically samples: degree-program technical terms, faculty-coined coined terminology, and proper-name drug/procedure/method words. These are exactly the categories where general speech models fail.

The right architectural move is to retrofit the back-catalog with glossary-aware captions tied to the department's own term list (chemistry, biology, CS, law all maintain their own jargon stockpiles), then move new captures to a captioning-on-ingest pipeline. GlossCap is built for this exact pattern.

Why glossary-aware captions matter more on Kaltura

Kaltura is the LMS-adjacent platform with the heaviest concentration of higher-ed and life-sciences content — both verticals where domain terminology is the surface form most exposed to mis-captioning. A general-purpose Whisper output will write "tier zip a tide" where the lecturer said tirzepatide; "ku ber net es" where the engineer said Kubernetes; "see four ess" where the reading list said CSS-4. Each of those is a sampled-segment failure on an audit, and worse, a comprehension failure for the deaf-or-hard-of-hearing learner the captions exist for.

GlossCap's approach: your company or department glossary feeds a logit bias into Whisper-large's decoder before output, so the surface form lands right the first time. The output VTT or SRT is WCAG 2.1 AA-compliant on first export — ready for the caption_captionasset.add call.

See pricing