Platform reference · Microsoft Teams · M365 · Live Events · Copilot · Viva Learning · GCC

Microsoft Teams captions: meetings, recordings, Live Events, Copilot — M365 tenant training compliance

Microsoft Teams is the collaboration hub at the centre of every Microsoft 365 tenant — the tool where most employee meeting time is spent, where the recordings that become training content are produced, and where Copilot AI meeting summaries are generated. The existing Microsoft Stream captions reference covers Stream-on-SharePoint as the video repository where Teams meeting recordings land and are distributed. This page covers the Teams meeting and events layer upstream of that repository: the Teams meeting transcript, the Live Events recording and caption pipeline, Teams Phone call recording, Copilot meeting notes, and Viva Learning courseware — the surfaces where captions are produced before the video reaches Stream-on-SharePoint. For most M365 tenants, Teams is where the captioning problem originates and Stream is where the solution lives; this page covers the origination.

TL;DR

A Microsoft Teams captioning workflow spans five surfaces. (1) Teams meeting live captions — real-time speech-to-text during the meeting, ephemeral by default, speaker-attributed when speaker recognition is enabled. (2) Teams meeting recording + transcript — the meeting recording deposits in Stream-on-SharePoint; the transcript deposits alongside it as a VTT file, inheriting the same STT accuracy. (3) Teams Live Events — broadcast-style events with live CART captioning pathway and on-demand recording with captions. (4) Copilot meeting notes and AI recap — Copilot AI summaries, chapters, and action items all derive from the same meeting transcript, inheriting its proper-noun errors. (5) Viva Learning — courses assembled from Teams recordings and assigned to learners via Viva Learning inherit the caption state of the source recordings. The proper-noun failure mode propagates across all five: what gets mangled in the live caption appears verbatim in the transcript, the Stream recording, the Copilot notes, and the Viva Learning course.

Teams vs Microsoft Stream — the distinction this page draws

Teams and Stream are different layers of the M365 video stack, and the Stream captions reference addresses the repository and distribution layer in depth. The distinction:

The captioning fix lives in Stream (upload the corrected VTT to the recording in Stream, or use the Stream admin API). But the captioning problem originates in Teams (the STT engine that produced the mangled transcript). This page covers the problem-origination layer — Teams meeting surfaces, Live Events, Copilot, and Viva Learning — so that operators understand which surfaces generate transcripts and what those transcripts feed into.

M365 tenant types and their Teams captioning posture

Teams captioning behaviour and compliance posture vary by M365 plan:

The compliance posture tightens substantially from commercial M365 to GCC to GCC High to DoD. A captioning workflow that is straightforward on commercial M365 may require a 3–6-month vendor-selection cycle on GCC High, which has the same gating dimension as Webex for Government.

Surface 1 — Teams meeting live captions

Microsoft Teams generates real-time captions during active meetings using Azure Cognitive Services speech-to-text. The captions display in the meeting UI for each participant who has live captions enabled. Key characteristics:

Live captions are the meeting-time accommodation surface. For learners with documented hearing-related accommodation needs in a Teams meeting (a training session, an all-hands, a virtual classroom), the live-caption accuracy on proper nouns is the real-time failure mode they experience. CART captioner integration provides the defensible accommodation path: a designated captioner participant types corrections in real time.

Surface 2 — Teams meeting recording + transcript

When a Teams meeting is recorded, two assets are deposited in SharePoint or OneDrive after the meeting ends:

  1. The video recording (MP4), accessible via the Stream player.
  2. The meeting transcript (VTT file), attached to the recording in Stream-on-SharePoint as the default caption track.

The transcript is generated from the same Azure Cognitive Services STT engine that powered the live captions. The VTT file is the stored, time-coded output — every cue in the VTT corresponds to a segment of the recording, with a timestamp and the attributed speaker name (if speaker recognition was enabled).

The transcript VTT is the starting point for captioning compliance. Its problems:

Replacing the transcript VTT on a Stream recording with a corrected, glossary-biased VTT is the captioning compliance fix. The Microsoft Stream captions reference covers the upload-replacement workflow on the Stream side. The transcript VTT that GlossCap produces from a Teams recording is a clean, properly timed, glossary-corrected replacement for the auto-generated VTT.

Surface 3 — Teams Live Events

Teams Live Events (now migrating to Teams Webinars and Teams Town Hall as the product family) are broadcast-style events with a presenter-to-audience model, used for all-hands meetings, corporate town halls, large training webinars, and executive communications. The captioning surface differs from regular Teams meetings:

Teams Town Hall (the successor product) follows a similar pattern: live CART integration for the live event, auto-transcript on the recording, and a replacement workflow for the recording caption track. For all-hands or town-hall recordings distributed as training content (a common pattern at corporate L&D teams), the same caption compliance obligations apply to the recording as to any other training video.

Surface 4 — Copilot meeting notes and AI recap

Microsoft Teams Copilot (requires M365 Copilot licence or Teams Premium) generates AI-powered meeting summaries, action items, chapter markers, speaker contribution summaries, and follow-up recommendations. Every one of these outputs derives from the meeting transcript. Every proper-noun error in the transcript propagates into the Copilot output:

Copilot output is increasingly used as the working record of a meeting — the summary that gets shared to Teams channels, Outlook inboxes, and SharePoint pages as the authoritative post-meeting document. A Copilot summary built on a mangled transcript propagates the mangling into the institutional record. For healthcare training sessions where the Copilot summary might reference drug protocols or procedure names, this is a substantive accuracy failure in the institutional record.

The fix is at the transcript layer: a corrected, glossary-biased transcript fed back to the recording before Copilot re-processes it produces clean Copilot output. This is one of the strongest multiplier arguments for fixing Teams meeting transcripts — the benefit extends beyond the caption track on the video to every AI-powered downstream artifact Copilot generates.

Surface 5 — Viva Learning

Microsoft Viva Learning is the M365 learning-and-development hub that surfaces training content — from SharePoint, Stream, and external LMS integrations — directly in the Teams interface. Teams recordings promoted to Viva Learning courses inherit the caption state of their Stream-on-SharePoint source recording:

Viva Learning also integrates with external LMS platforms — Cornerstone OnDemand, SAP SuccessFactors, TalentLMS, Degreed, and others — surfacing external-LMS courses inside Teams. The caption state of those courses depends on the originating LMS, not on Stream. For a Viva Learning tenant that surfaces courses from both Stream-hosted Teams recordings and external LMS sources, the captioning compliance inventory must cover both source systems.

For ADA Title II-bound public-university tenants running Viva Learning for employee training, the training-video captions accessible through Viva Learning carry the same substantive-accuracy obligation as training videos in Canvas, Brightspace, or any other distribution system.

Compliance regimes — Microsoft Teams across M365 tenant types

Proper-noun failure modes in Teams meeting recording content

Teams meeting recordings span the full range of institutional content types. The failure modes by tenant segment:

The Teams recording retrofit pattern

For an M365 tenant sitting on a Stream-on-SharePoint library of Teams meeting recordings — whether from all-hands sessions, product trainings, engineering talks, or clinical-education meetings — the retrofit runs in five phases:

  1. Inventory. Use the Microsoft Graph API to enumerate video files in SharePoint document libraries and OneDrive folders. Teams meeting recordings are identifiable by MIME type (video/mp4) and metadata (recording source). Most tenants discover that 15–35% of their Teams recordings have been promoted to training context — linked from a Viva Learning course, embedded in a SharePoint training page, or assigned via a channel tab in a learning-focused Team.
  2. Triage. Rank by instructional exposure: recordings embedded in active Viva Learning courses first, recordings assigned in Teams channel tabs high, recordings linked from employee-facing SharePoint training portals. Recordings from meetings no one accessed in six months can be archived. The triage cut typically removes 40–60% of the raw catalogue from retrofit scope.
  3. Caption production. For each triage-selected recording, produce a glossary-biased WebVTT. The institutional glossary is built once — SDK names, drug formulary, regulatory citations, product names — and applies to every recording in the catalogue. The M365 EU Data Boundary / sensitivity-label posture (detailed in Stream captions) must be respected in how recordings are shared with the captioning service.
  4. Upload. Upload the corrected VTT to the recording in Stream-on-SharePoint via the Stream caption-upload UI or the Microsoft Graph API (PUT /drives/{drive-id}/items/{item-id}/microsoft.graph.createUploadSession for captions). The corrected VTT replaces the auto-generated transcript VTT as the displayed caption track.
  5. Log. Asset register: SharePoint file ID, video title, Teams meeting ID, transcript VTT version, caption source, upload date, downstream Viva Learning course IDs, reviewer name and date. The register is the compliance-audit artefact.

See pricing

FAQ — Microsoft Teams captions

How is this page different from the Microsoft Stream captions page?

The Stream captions reference covers Stream-on-SharePoint as the video repository — where Teams recordings land, how the Stream player exposes captions, how to upload a replacement caption track in Stream, and the M365 tenant-policy considerations (EU Data Boundary, sensitivity labels, DLP, external sharing) that govern how recordings flow through the tenant. This page covers the Teams meeting layer upstream: the live-caption surface during the meeting, the meeting transcript VTT that Teams produces, the Live Events captioning pathway, Copilot AI notes (which derive from the transcript), and Viva Learning (which distributes the recordings as training content). Together, the two pages cover the full M365 video captioning stack from meeting recording to learning distribution.

Does the auto-generated Teams meeting transcript clear ADA Title II SC 1.2.2 / Section 508?

No, not for technical content. The auto-generated VTT from a Teams meeting transcript is built from Azure Cognitive Services general-purpose STT, which produces the same 80–90% accuracy band on conversational audio and materially lower accuracy on technical proper nouns as all other generic STT systems. SC 1.2.2's "accurately convey the audio" standard means that product names, drug names, regulatory citations, and SDK terms cannot be systematically mangled without the caption failing the standard. The auto-generated transcript is a draft, not a compliance artefact.

What is Teams Premium and does it improve captioning?

Teams Premium adds speaker recognition, intelligent meeting recap, advanced Copilot features, and live translation in addition to the standard Teams meeting features. Speaker recognition improves the speaker-attribution accuracy of the transcript (instead of "Speaker 1", you get the actual attendee name on each cue) — which is valuable for multi-speaker training recordings where knowing who said what matters. But Teams Premium does not improve the substantive accuracy of the STT engine on technical proper nouns. A Teams Premium transcript has better speaker attribution and is better organised for Copilot recap, but it still mangles the same SDK names, drug names, and regulatory citations that standard Teams transcripts mangle.

Does Copilot AI recap fix the transcript?

No — Copilot summarises the transcript, it does not correct it. If the transcript says "tie-zer-pa-tide" instead of "tirzepatide," Copilot's meeting summary will include "tie-zer-pa-tide" in the drug-related discussion section. Copilot AI output quality is bounded by the transcript quality it's working from. The correct order of operations: fix the transcript first (via glossary-biased captioning), then let Copilot process the corrected transcript for summaries, chapters, and action items.

My organisation uses GCC / GCC High. Which captioning vendors are eligible?

For GCC, captioning vendors must at minimum hold a current DPA (GDPR Article 28 equivalent under GCC terms) and SOC 2 Type II. Many enterprise captioning vendors can meet GCC requirements. For GCC High, the eligible vendor set is much smaller — the vendor must hold FedRAMP High authorisation and operate within the GCC High data boundary. ITAR-controlled content (common at defense-contractor tenants) must not leave the ITAR-authorised data environment; most commercial captioning vendors are not ITAR-capable. The vendor-selection cycle for a GCC High Teams captioning engagement typically runs 3–6 months.

How do Teams recordings get into Viva Learning?

Viva Learning can surface content from SharePoint — and since Teams recordings land in SharePoint document libraries, a SharePoint site configured as a Viva Learning content source will surface the video files in the Viva Learning feed. Additionally, Viva Learning supports course-creation workflows where L&D teams promote specific recordings to formal courses. The caption track on each recording in SharePoint is what Viva Learning displays — which is why uploading corrected captions to the Stream recording fixes the caption quality in Viva Learning as well.

Further reading