Format reference

TTML captions for LMS: when your platform needs the XML schema

TTML (Timed Text Markup Language) is the W3C XML format for captions and subtitles. Kaltura, Brightcove, and several enterprise LMSes accept — and sometimes prefer — TTML over SRT. This is the XML root, the IMSC profile most LMSes actually expect, and when TTML is the right export from GlossCap.

TL;DR

TTML is XML with a <tt> root element in the http://www.w3.org/ns/ttml namespace. Captions live inside <body><div><p begin="..." end="..."> elements. Most LMSes ask for the IMSC 1 Text Profile subset — a stricter variant designed for interoperability. TTML's win over SRT is structure: it carries language metadata, styling, positioning, and ruby annotations in the file itself, so an LMS player can render multi-language caption tracks and colour-code speakers without player-specific config. TTML loses to SRT on ubiquity — if the upload form just says "subtitle file", SRT is safer.

What a TTML file actually looks like

A minimal IMSC-compliant TTML:

<?xml version="1.0" encoding="UTF-8"?>
<tt xmlns="http://www.w3.org/ns/ttml"
    xmlns:ttm="http://www.w3.org/ns/ttml#metadata"
    xmlns:tts="http://www.w3.org/ns/ttml#styling"
    xml:lang="en">
  <head>
    <styling>
      <style xml:id="s1" tts:color="white"
             tts:backgroundColor="rgba(0,0,0,0.75)"
             tts:fontFamily="sansSerif"
             tts:fontSize="100%"/>
    </styling>
  </head>
  <body>
    <div>
      <p begin="00:00:03.200" end="00:00:06.400" style="s1">
        <span ttm:agent="alex">First, run kubectl get pods to see
what's running in the cluster.</span>
      </p>
      <p begin="00:00:06.400" end="00:00:09.100" style="s1">
        Then apply the Helm chart with helm install.
      </p>
    </div>
  </body>
</tt>

Things the schema actually buys you over SRT:

The IMSC profile most LMSes expect

Vanilla TTML 2 is large — the full spec covers everything from broadcast subtitles to animated text overlays. LMSes don't need any of that. The interoperable subset is IMSC 1 Text Profile (SMPTE-TT is a closely related US broadcast profile). IMSC 1 Text drops features that make sense for film subtitling but complicate player implementation — animation, bidirectional text flow, and most of the SMIL integration. What remains is: cue timing, block/inline styling, regions, and metadata.

If an LMS says "TTML" without qualification, assume IMSC 1 Text. If it says "SMPTE-TT", ask — that is the US broadcast profile and expects a slightly different metadata block. GlossCap exports IMSC 1 Text by default, which is accepted by Kaltura, Brightcove, JW Player, THEOplayer, and most in-house HTML5 players.

When TTML beats SRT and VTT

Pick TTML when:

Stay with SRT or VTT when: the LMS upload form just accepts "subtitles", you are embedding on a simple HTML5 <video> element, or you want the lowest-friction path for a one-time compliance push. SRT and VTT are both orders of magnitude more common in the wild.

How GlossCap exports TTML

Under the hood, the caption content is the same regardless of export format — we run the Whisper-large decode once with glossary-biased logit boosts, format the result with speaker labels and non-speech sound cues, and emit the chosen wrapper. For TTML specifically:

The terminology-preservation promise is identical across formats: paste in your glossary once, and kubectl, tirzepatide, and Docebo come out right in SRT, VTT, and TTML alike. The verbatim-for-dialogue requirement of WCAG SC 1.2.2 is a content requirement, not a format one — and format switching never fixes a missed term.

See pricing

Related questions

Is TTML the same as DFXP?

DFXP is the old name — "Distribution Format Exchange Profile" was the W3C Working Draft nomenclature that shipped as the first TTML profile. "TTML" is what the spec is called now. If a vendor asks for a .dfxp file, it's the same thing with a different extension.

What about SMPTE-TT?

SMPTE-TT is the US broadcast industry's TTML profile, spec'd by SMPTE (the film/TV engineering body). It's an IMSC-adjacent variant with broadcast-specific metadata — source timecode references, caption-channel descriptors. If you're exporting captions for a streaming platform that originated in US broadcast (PBS, some local affiliates), SMPTE-TT is the safe pick.

Can I convert an SRT to TTML?

Mechanically yes — the timecodes translate directly and the text content is the same. What you lose is the speaker-attribution and styling structure TTML could carry; you get a TTML file that is syntactically valid but no richer than the SRT it came from. GlossCap exports TTML natively, which preserves those hooks.

Does TTML support chapters?

Not natively as a spec concept, but IMSC allows <metadata> blocks that some players use for chapter markers. The more common pattern is a separate chapter file (VTT-kind="chapters" or an XMP sidecar) alongside the caption TTML.

Further reading