Platform reference · Microsoft Teams · M365 · Live Events · Copilot · Viva Learning · GCC

Microsoft Teams captions: meetings, recordings, Live Events, Copilot — M365 tenant training compliance

Microsoft Teams is the collaboration hub at the centre of every Microsoft 365 tenant — the tool where most employee meeting time is spent, where the recordings that become training content are produced, and where Copilot AI meeting summaries are generated. The existing Microsoft Stream captions reference covers Stream-on-SharePoint as the video repository where Teams meeting recordings land and are distributed. This page covers the Teams meeting and events layer upstream of that repository: the Teams meeting transcript, the Live Events recording and caption pipeline, Teams Phone call recording, Copilot meeting notes, and Viva Learning courseware — the surfaces where captions are produced before the video reaches Stream-on-SharePoint. For most M365 tenants, Teams is where the captioning problem originates and Stream is where the solution lives; this page covers the origination.

TL;DR

A Microsoft Teams captioning workflow spans five surfaces. (1) Teams meeting live captions — real-time speech-to-text during the meeting, ephemeral by default, speaker-attributed when speaker recognition is enabled. (2) Teams meeting recording + transcript — the meeting recording deposits in Stream-on-SharePoint; the transcript deposits alongside it as a VTT file, inheriting the same STT accuracy. (3) Teams Live Events — broadcast-style events with live CART captioning pathway and on-demand recording with captions. (4) Copilot meeting notes and AI recap — Copilot AI summaries, chapters, and action items all derive from the same meeting transcript, inheriting its proper-noun errors. (5) Viva Learning — courses assembled from Teams recordings and assigned to learners via Viva Learning inherit the caption state of the source recordings. The proper-noun failure mode propagates across all five: what gets mangled in the live caption appears verbatim in the transcript, the Stream recording, the Copilot notes, and the Viva Learning course.

Teams vs Microsoft Stream — the distinction this page draws

Teams and Stream are different layers of the M365 video stack, and the Stream captions reference addresses the repository and distribution layer in depth. The distinction:

Microsoft Teams is the collaboration and meeting application. When a Teams meeting is recorded, Teams captures the video and generates the transcript. After the meeting ends, Teams deposits the recording and the transcript in the organiser's or meeting organiser's designated SharePoint site or OneDrive, where Microsoft Stream (on SharePoint) picks them up as the playback and distribution system.
Stream is the video repository and player. Once the recording is in Stream, users view it via the Stream player on SharePoint, receive a link in the Teams meeting chat, or access it through a channel tab. The Stream page has a caption-upload workflow for adding or replacing the VTT caption track on the recording.

The captioning fix lives in Stream (upload the corrected VTT to the recording in Stream, or use the Stream admin API). But the captioning problem originates in Teams (the STT engine that produced the mangled transcript). This page covers the problem-origination layer — Teams meeting surfaces, Live Events, Copilot, and Viva Learning — so that operators understand which surfaces generate transcripts and what those transcripts feed into.

M365 tenant types and their Teams captioning posture

Teams captioning behaviour and compliance posture vary by M365 plan:

M365 Business (Basic / Standard / Premium). SMB tenants. Recordings go to OneDrive. Transcript on by default if recording is on. Captioning vendor accesses recordings via standard GDPR Article 28 DPA and SOC 2 posture.
M365 Enterprise (E1 / E3 / E5). Large corporate tenants. Recordings go to SharePoint. Transcript and speaker attribution on by default if enabled by admin. Teams Premium adds additional transcript features. Captioning vendor accesses via enterprise DPA and, for sensitive content, InfoSec questionnaire.
M365 Education (A1 / A3 / A5). K-12 districts and universities. FERPA-regulated. Recordings and transcripts that include student speech are education records. Captioning vendor must be FERPA-appropriate.
M365 Government Community Cloud (GCC). US state/local government and federal-contractor tenants on FedRAMP-Moderate-aligned M365 stack. Teams GCC is separated from commercial M365. Captioning vendors must hold appropriate posture; the eligible vendor set is smaller than commercial M365.
M365 GCC High. US federal civilian and defense-contractor tenants on FedRAMP High-aligned M365 stack. ITAR-capable tenant. External captioning vendors must hold FedRAMP High authorisation; very few do.
M365 DoD. US Department of Defense tenants on DoD IL4/IL5-aligned M365 stack. Only DoD-authorised vendors can process recordings.

The compliance posture tightens substantially from commercial M365 to GCC to GCC High to DoD. A captioning workflow that is straightforward on commercial M365 may require a 3–6-month vendor-selection cycle on GCC High, which has the same gating dimension as Webex for Government.

Surface 1 — Teams meeting live captions

Microsoft Teams generates real-time captions during active meetings using Azure Cognitive Services speech-to-text. The captions display in the meeting UI for each participant who has live captions enabled. Key characteristics:

Speaker attribution. When speaker recognition is enabled (requires Teams Premium or specific admin policy), live captions are attributed to the named speaker. Speaker recognition requires voice profile enrollment by participants and admin enablement.
Ephemeral by default. Live captions during the meeting are not stored as a caption file on the recording by default. The transcript feature (Surface 2) produces the stored, time-coded text.
Language support. Teams live captions support dozens of languages. Multi-language live translation is available in Teams Premium, displaying live-translated captions in a selected target language.
Substantive accuracy on technical content. Same as all major generic STT surfaces: 80–90% on conversational audio, materially lower on dense technical proper nouns. Azure Cognitive Services STT is a general-purpose model; it does not adapt to institutional vocabulary out of the box.

Live captions are the meeting-time accommodation surface. For learners with documented hearing-related accommodation needs in a Teams meeting (a training session, an all-hands, a virtual classroom), the live-caption accuracy on proper nouns is the real-time failure mode they experience. CART captioner integration provides the defensible accommodation path: a designated captioner participant types corrections in real time.

Surface 2 — Teams meeting recording + transcript

When a Teams meeting is recorded, two assets are deposited in SharePoint or OneDrive after the meeting ends:

The video recording (MP4), accessible via the Stream player.
The meeting transcript (VTT file), attached to the recording in Stream-on-SharePoint as the default caption track.

The transcript is generated from the same Azure Cognitive Services STT engine that powered the live captions. The VTT file is the stored, time-coded output — every cue in the VTT corresponds to a segment of the recording, with a timestamp and the attributed speaker name (if speaker recognition was enabled).

The transcript VTT is the starting point for captioning compliance. Its problems:

Proper-noun mangling throughout. Every technical product name, SDK, drug name, regulatory citation, government agency acronym, or competitor name that was spoken in the meeting appears in its mangled form. The mangled form is time-coded and speaker-attributed — it looks official.
Speaker-attribution gaps. Speakers who did not enroll voice profiles appear as "Speaker 1," "Speaker 2," etc., or the attribution defaults to the last recognized speaker. In high-participation training sessions, speaker attribution is often partially or completely wrong.
Meeting-chat artefacts. Teams transcript sometimes includes chat messages or reaction events as transcript entries; these appear as out-of-context cues in the VTT.

Replacing the transcript VTT on a Stream recording with a corrected, glossary-biased VTT is the captioning compliance fix. The Microsoft Stream captions reference covers the upload-replacement workflow on the Stream side. The transcript VTT that GlossCap produces from a Teams recording is a clean, properly timed, glossary-corrected replacement for the auto-generated VTT.

Surface 3 — Teams Live Events

Teams Live Events (now migrating to Teams Webinars and Teams Town Hall as the product family) are broadcast-style events with a presenter-to-audience model, used for all-hands meetings, corporate town halls, large training webinars, and executive communications. The captioning surface differs from regular Teams meetings:

Live captions during the event. Teams Live Events support both the auto-caption path (same STT engine) and an external CART captioning path where a designated CART captioner provides real-time captions that are displayed to attendees. The external CART path is the defensible accommodation pathway for regulated events.
Recording and on-demand access. Teams Live Event recordings are available on-demand after the event, typically via a SharePoint or Teams channel link. The recording carries the auto-generated captions from the live event.
Caption file download and replacement. The Live Event organiser can download the auto-generated VTT from the recording and upload a corrected version. The corrected VTT replaces the auto-generated track on the on-demand recording.
Continuing-education credit obligations. Live Events used for CME, CLE, CPE, or other accredited continuing education must provide accessible content; many accreditation bodies require substantively accurate captions, not just any caption track.

Teams Town Hall (the successor product) follows a similar pattern: live CART integration for the live event, auto-transcript on the recording, and a replacement workflow for the recording caption track. For all-hands or town-hall recordings distributed as training content (a common pattern at corporate L&D teams), the same caption compliance obligations apply to the recording as to any other training video.

Surface 4 — Copilot meeting notes and AI recap

Microsoft Teams Copilot (requires M365 Copilot licence or Teams Premium) generates AI-powered meeting summaries, action items, chapter markers, speaker contribution summaries, and follow-up recommendations. Every one of these outputs derives from the meeting transcript. Every proper-noun error in the transcript propagates into the Copilot output:

A drug name mangled in the transcript becomes a mangled drug name in the Copilot meeting summary.
An engineering SDK name mangled in the transcript becomes a mangled SDK name in the action items.
A regulatory citation mangled in the transcript becomes a mangled citation in the meeting recap shared to the Teams channel.

Copilot output is increasingly used as the working record of a meeting — the summary that gets shared to Teams channels, Outlook inboxes, and SharePoint pages as the authoritative post-meeting document. A Copilot summary built on a mangled transcript propagates the mangling into the institutional record. For healthcare training sessions where the Copilot summary might reference drug protocols or procedure names, this is a substantive accuracy failure in the institutional record.

The fix is at the transcript layer: a corrected, glossary-biased transcript fed back to the recording before Copilot re-processes it produces clean Copilot output. This is one of the strongest multiplier arguments for fixing Teams meeting transcripts — the benefit extends beyond the caption track on the video to every AI-powered downstream artifact Copilot generates.

Surface 5 — Viva Learning

Microsoft Viva Learning is the M365 learning-and-development hub that surfaces training content — from SharePoint, Stream, and external LMS integrations — directly in the Teams interface. Teams recordings promoted to Viva Learning courses inherit the caption state of their Stream-on-SharePoint source recording:

A recording with no corrected caption track displays in Viva Learning with the auto-generated VTT, or with no captions if the auto-VTT was removed.
A recording with a corrected caption track uploaded to Stream displays in Viva Learning with the corrected track.

Viva Learning also integrates with external LMS platforms — Cornerstone OnDemand, SAP SuccessFactors, TalentLMS, Degreed, and others — surfacing external-LMS courses inside Teams. The caption state of those courses depends on the originating LMS, not on Stream. For a Viva Learning tenant that surfaces courses from both Stream-hosted Teams recordings and external LMS sources, the captioning compliance inventory must cover both source systems.

For ADA Title II-bound public-university tenants running Viva Learning for employee training, the training-video captions accessible through Viva Learning carry the same substantive-accuracy obligation as training videos in Canvas, Brightspace, or any other distribution system.

Compliance regimes — Microsoft Teams across M365 tenant types

ADA Title II. Public-university and state/local-government M365 tenants. Teams meeting recordings used as training or instructional content must have substantively accurate captions. OCR investigates training-video captions in LMS and video-host systems; Stream-on-SharePoint is in scope as a distribution system when recordings are shared with learners.
Section 508. Federal-agency M365 tenants (GCC / GCC High / DoD) and federal contractors. 36 CFR § 1194 / WCAG 2.0 AA captioning on federal-program-related video.
Section 504. Federal-fund-recipient institutions — universities, hospitals, school districts, non-profits. Programmatic accessibility on instructional, program-relevant, and training video. Individual accommodation needs (IEP or 504 plan) trigger immediate captioning obligations for the student's or employee's access to Teams-hosted training content.
HIPAA. Healthcare M365 tenants. HIPAA training captions covers the workforce-training mandate. Healthcare content in Teams recordings — patient-case references in training meetings, clinical-protocol discussions recorded for reference — may be PHI-adjacent; captioning-vendor BAA is appropriate.
EAA. EU M365 tenants. Training video distributed to employees or customers via Teams and Stream is subject to EN 301 549 clause 7.1.1–7.1.5 captioning requirements when the service falls within EAA product/service scope.
AODA. Ontario tenants. IASR § 14 captioning obligation on training and instructional video distributed to employees. Three-year compliance reporting cycle; next major large-organisation filing window 2026.

Proper-noun failure modes in Teams meeting recording content

Teams meeting recordings span the full range of institutional content types. The failure modes by tenant segment:

SaaS / engineering tenants. SDK and framework names (React, Next.js, TypeScript, Kubernetes, Helm, Terraform, Pulumi, Argo, Flux, gRPC, Protobuf, GraphQL, DataDog, PagerDuty, Sentry, Snowflake, dbt, Fivetran), product and service names, competitor names. Azure DevOps, Azure Kubernetes Service, Azure Cognitive Services — Cisco Webex running in Teams feels like a small irony here — all generate platform-specific proper nouns that the general STT fails on.
Healthcare tenants. Same drug formulary and procedure vocabulary as detailed in medical training captions. Additionally, healthcare-tenant Teams recordings frequently include regulatory discussions where OCR, CMS, OPM, and § citations matter verbatim.
Financial-services tenants. FINRA / SEC / OCC / FDIC / OFAC / CFPB regulatory citations, product names, account and fund names, risk-framework terminology (Basel III, DORA, MiFID II, EMIR, LIBOR/SOFR transition vocabulary).
Education tenants. For K-12, see the Schoology captions reference for K-12 curriculum vocabulary. For higher-ed, discipline-specific vocabulary including the categories catalogued in Canvas LMS captions.
Government tenants (GCC / GCC High). Same federal-program acronym and regulatory citation register as detailed in Webex captions. For GCC High tenants, the ITAR-controlled content exclusion from external captioning applies here as well — such content must not be sent to commercial captioning vendors.

The Teams recording retrofit pattern

For an M365 tenant sitting on a Stream-on-SharePoint library of Teams meeting recordings — whether from all-hands sessions, product trainings, engineering talks, or clinical-education meetings — the retrofit runs in five phases:

Inventory. Use the Microsoft Graph API to enumerate video files in SharePoint document libraries and OneDrive folders. Teams meeting recordings are identifiable by MIME type (video/mp4) and metadata (recording source). Most tenants discover that 15–35% of their Teams recordings have been promoted to training context — linked from a Viva Learning course, embedded in a SharePoint training page, or assigned via a channel tab in a learning-focused Team.
Triage. Rank by instructional exposure: recordings embedded in active Viva Learning courses first, recordings assigned in Teams channel tabs high, recordings linked from employee-facing SharePoint training portals. Recordings from meetings no one accessed in six months can be archived. The triage cut typically removes 40–60% of the raw catalogue from retrofit scope.
Caption production. For each triage-selected recording, produce a glossary-biased WebVTT. The institutional glossary is built once — SDK names, drug formulary, regulatory citations, product names — and applies to every recording in the catalogue. The M365 EU Data Boundary / sensitivity-label posture (detailed in Stream captions) must be respected in how recordings are shared with the captioning service.
Upload. Upload the corrected VTT to the recording in Stream-on-SharePoint via the Stream caption-upload UI or the Microsoft Graph API (PUT /drives/{drive-id}/items/{item-id}/microsoft.graph.createUploadSession for captions). The corrected VTT replaces the auto-generated transcript VTT as the displayed caption track.
Log. Asset register: SharePoint file ID, video title, Teams meeting ID, transcript VTT version, caption source, upload date, downstream Viva Learning course IDs, reviewer name and date. The register is the compliance-audit artefact.

See pricing

FAQ — Microsoft Teams captions

How is this page different from the Microsoft Stream captions page?

The Stream captions reference covers Stream-on-SharePoint as the video repository — where Teams recordings land, how the Stream player exposes captions, how to upload a replacement caption track in Stream, and the M365 tenant-policy considerations (EU Data Boundary, sensitivity labels, DLP, external sharing) that govern how recordings flow through the tenant. This page covers the Teams meeting layer upstream: the live-caption surface during the meeting, the meeting transcript VTT that Teams produces, the Live Events captioning pathway, Copilot AI notes (which derive from the transcript), and Viva Learning (which distributes the recordings as training content). Together, the two pages cover the full M365 video captioning stack from meeting recording to learning distribution.

Does the auto-generated Teams meeting transcript clear ADA Title II SC 1.2.2 / Section 508?

No, not for technical content. The auto-generated VTT from a Teams meeting transcript is built from Azure Cognitive Services general-purpose STT, which produces the same 80–90% accuracy band on conversational audio and materially lower accuracy on technical proper nouns as all other generic STT systems. SC 1.2.2's "accurately convey the audio" standard means that product names, drug names, regulatory citations, and SDK terms cannot be systematically mangled without the caption failing the standard. The auto-generated transcript is a draft, not a compliance artefact.

What is Teams Premium and does it improve captioning?

Teams Premium adds speaker recognition, intelligent meeting recap, advanced Copilot features, and live translation in addition to the standard Teams meeting features. Speaker recognition improves the speaker-attribution accuracy of the transcript (instead of "Speaker 1", you get the actual attendee name on each cue) — which is valuable for multi-speaker training recordings where knowing who said what matters. But Teams Premium does not improve the substantive accuracy of the STT engine on technical proper nouns. A Teams Premium transcript has better speaker attribution and is better organised for Copilot recap, but it still mangles the same SDK names, drug names, and regulatory citations that standard Teams transcripts mangle.

Does Copilot AI recap fix the transcript?

No — Copilot summarises the transcript, it does not correct it. If the transcript says "tie-zer-pa-tide" instead of "tirzepatide," Copilot's meeting summary will include "tie-zer-pa-tide" in the drug-related discussion section. Copilot AI output quality is bounded by the transcript quality it's working from. The correct order of operations: fix the transcript first (via glossary-biased captioning), then let Copilot process the corrected transcript for summaries, chapters, and action items.

My organisation uses GCC / GCC High. Which captioning vendors are eligible?

For GCC, captioning vendors must at minimum hold a current DPA (GDPR Article 28 equivalent under GCC terms) and SOC 2 Type II. Many enterprise captioning vendors can meet GCC requirements. For GCC High, the eligible vendor set is much smaller — the vendor must hold FedRAMP High authorisation and operate within the GCC High data boundary. ITAR-controlled content (common at defense-contractor tenants) must not leave the ITAR-authorised data environment; most commercial captioning vendors are not ITAR-capable. The vendor-selection cycle for a GCC High Teams captioning engagement typically runs 3–6 months.

How do Teams recordings get into Viva Learning?

Viva Learning can surface content from SharePoint — and since Teams recordings land in SharePoint document libraries, a SharePoint site configured as a Viva Learning content source will surface the video files in the Viva Learning feed. Additionally, Viva Learning supports course-creation workflows where L&D teams promote specific recordings to formal courses. The caption track on each recording in SharePoint is what Viva Learning displays — which is why uploading corrected captions to the Stream recording fixes the caption quality in Viva Learning as well.