Format reference

TTML captions for LMS: when your platform needs the XML schema

TTML (Timed Text Markup Language) is the W3C XML format for captions and subtitles. Kaltura, Brightcove, and several enterprise LMSes accept — and sometimes prefer — TTML over SRT. This is the XML root, the IMSC profile most LMSes actually expect, and when TTML is the right export from GlossCap.

TL;DR

TTML is XML with a <tt> root element in the http://www.w3.org/ns/ttml namespace. Captions live inside <body><div> elements. Most LMSes ask for the IMSC 1 Text Profile subset — a stricter variant designed for interoperability. TTML's win over SRT is structure: it carries language metadata, styling, positioning, and ruby annotations in the file itself, so an LMS player can render multi-language caption tracks and colour-code speakers without player-specific config. TTML loses to SRT on ubiquity — if the upload form just says "subtitle file", SRT is safer.

What a TTML file actually looks like

A minimal IMSC-compliant TTML:

<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml"
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:tts="http://www.w3.org/ns/ttml#styling"
    xml:lang="en">
  <head>
    <styling>
      <style xml:id="s1" tts:color="white"
             tts:backgroundColor="rgba(0,0,0,0.75)"
             tts:fontFamily="sansSerif"
             tts:fontSize="100%"/>
    </styling>
  </head>
  <body>
    <div>
      <p begin="00:00:03.200" end="00:00:06.400" style="s1">
        <span ttm:agent="alex">First, run kubectl get pods to see
what's running in the cluster.</span>
      </p>
      <p begin="00:00:06.400" end="00:00:09.100" style="s1">
        Then apply the Helm chart with helm install.
      </p>
    </div>
  </body>
</tt>

Things the schema actually buys you over SRT:

xml:lang on the root. Every downstream tool knows the caption language without filename conventions.
First-class styling. tts:color, tts:backgroundColor, tts:fontFamily, and tts:fontSize are defined in the spec. You can reference a style by xml:id from any cue.
Structured speaker attribution. ttm:agent on a  with a matching <ttm:agent> in <head> gives the caption track a structured speaker model instead of inline brackets.
Positioning via regions. A <region> in <head> with tts:origin and tts:extent can be referenced from cues to pin captions to specific frame positions.
Ruby annotations.  for East Asian languages. Not relevant for English-only training content but makes TTML the format of choice if you caption Japanese or Chinese modules.

The IMSC profile most LMSes expect

Vanilla TTML 2 is large — the full spec covers everything from broadcast subtitles to animated text overlays. LMSes don't need any of that. The interoperable subset is IMSC 1 Text Profile (SMPTE-TT is a closely related US broadcast profile). IMSC 1 Text drops features that make sense for film subtitling but complicate player implementation — animation, bidirectional text flow, and most of the SMIL integration. What remains is: cue timing, block/inline styling, regions, and metadata.

If an LMS says "TTML" without qualification, assume IMSC 1 Text. If it says "SMPTE-TT", ask — that is the US broadcast profile and expects a slightly different metadata block. GlossCap exports IMSC 1 Text by default, which is accepted by Kaltura, Brightcove, JW Player, THEOplayer, and most in-house HTML5 players.

When TTML beats SRT and VTT

Pick TTML when:

Your LMS explicitly requests TTML in its caption-upload docs. Kaltura and some Brightcove-backed enterprise portals do.
You need multi-language caption bundles in a single file. TTML supports one file per language but its toolchain (packagers, DASH/HLS media manifests) treats TTML as first-class — SRT and VTT are second-class in streaming pipelines.
You need to preserve speaker colour coding across players. The ttm:agent + tts:color combination travels with the file; SRT has no hook.
You are archiving the caption track for long-term reuse across future players. TTML's schema is stable and W3C-ratified; it will survive the next player migration.

Stay with SRT or VTT when: the LMS upload form just accepts "subtitles", you are embedding on a simple HTML5 <video> element, or you want the lowest-friction path for a one-time compliance push. SRT and VTT are both orders of magnitude more common in the wild.

How GlossCap exports TTML

Under the hood, the caption content is the same regardless of export format — we run the Whisper-large decode once with glossary-biased logit boosts, format the result with speaker labels and non-speech sound cues, and emit the chosen wrapper. For TTML specifically:

IMSC 1 Text Profile output by default (changeable in export settings for SMPTE-TT).
Speaker labels emitted as ttm:agent metadata plus a  wrapper on the relevant cue text.
Non-speech sound cues ([laughter], [alarm]) wrapped in  so LMS players render them distinctly.
xml:lang on the root set to the source language (defaulting to en).
Cue timing in HH:MM:SS.mmm format — no media-clock variants, because every LMS we have tested rejects those.

The terminology-preservation promise is identical across formats: paste in your glossary once, and kubectl, tirzepatide, and Docebo come out right in SRT, VTT, and TTML alike. The verbatim-for-dialogue requirement of WCAG SC 1.2.2 is a content requirement, not a format one — and format switching never fixes a missed term.

See pricing