Platform reference · Microsoft Stream

Microsoft Stream captions: Stream-on-SharePoint, Teams meeting recordings, M365 tenant captioning

Microsoft Stream is the Microsoft 365 native video surface — the place where Teams meeting recordings land, where OneDrive videos live, where SharePoint video pages source their content, and where the Stream-classic legacy footprint still hosts a long tail of pre-2021 organisational video. Microsoft re-architected Stream as Stream-on-SharePoint in 2021-2022, retiring Stream-classic with extended migration windows; modern M365 tenants now host video as files in OneDrive (personal video) and SharePoint (organisational video), with the Stream-on-SharePoint player surfacing the content. Where Vimeo and Wistia are the SMB and B2B-SaaS-focused video hosts respectively, Stream is the surface every Microsoft-365-running organisation already has — which makes it the dominant default for internal training video at the 50-to-50,000-employee enterprise tenants where M365 is the corporate productivity stack. The captioning surface is conventional (WebVTT sidecar caption track per video, transcript edit-and-replace inside the Stream player, integration with Teams meeting auto-transcript) but with M365-tenant-specific data-residency, retention-policy, and tenant-policy concerns. Glossary-biased upstream captioning is what produces caption tracks clean enough to satisfy the audit lens that comes with M365-deployed mandatory training.

TL;DR

Microsoft Stream-on-SharePoint stores video as MP4 files in SharePoint document libraries (organisational video) and OneDrive (personal video). The captioning model is: (1) WebVTT sidecar caption track per video file, attached via the video file's caption-and-transcript-management surface; (2) auto-generated transcript from speech-to-text run by the M365 tenant on upload (subject to admin policy and licensing tier); (3) edit-the-transcript and re-attach as the canonical caption track. Teams meeting recordings auto-land in OneDrive (1:1 / small meetings) or SharePoint (channel meetings), inheriting the same caption-and-transcript model. The auto-transcript has the same generic-ASR limitation as every other auto-transcript — it mangles SDK names, drug names, regulatory citations, internal acronyms, customer names, supplier names, internal-product names. Replacing the auto-transcript with a clean glossary-biased WebVTT track is the path to caption tracks that hold up at audit. The M365-tenant-specific concerns are data residency (M365 multi-geo / EU Data Boundary), retention policy (tenant Information Protection labelling), tenant-policy (Stream-allowed sites, sharing-controls, external-sharing-disabled scenarios), and the Stream-classic-to-Stream-on-SharePoint migration legacy. Loom and Stream are the two async-video defaults at 50-500-employee SaaS — Loom by adoption, Stream by tenant-default.

What Microsoft Stream is, and where in the workflow captioning lands

Microsoft Stream's current architecture (Stream-on-SharePoint) treats video as a first-class file type stored in OneDrive and SharePoint, with the Stream player surfacing playback, transcription, captions, chapters, and forms-and-comments behaviour. The captioning-relevant characteristics:

Captioning lands at four points: (1) MP4 files in OneDrive / SharePoint via the video file's caption-and-transcript surface; (2) Teams meeting recordings (which are MP4s in OneDrive / SharePoint after the meeting); (3) embedded video on SharePoint pages (which inherits the underlying file's caption track); (4) Stream-classic legacy content for tenants with un-migrated catalogue.

The Stream caption-upload mechanic

The vocabulary surface in M365-tenant video

M365-tenant video carries the customer's full operational vocabulary because it's the inside-the-tenant communication surface. Common patterns:

Privacy, data residency, and tenant-policy concerns

M365-tenant video sits inside the customer's tenant boundary; the captioning workflow must respect that boundary. Common concerns:

The Stream-specific failure modes

The five caption-related findings most likely to surface during an OFCCP audit, an EAA inspection, an OCR HIPAA workforce-training file review, or an internal accessibility-self-audit on an M365-Stream-hosted catalogue:

  1. Auto-transcript left as the caption track on regulatory-citation-dense content. The auto-transcript mangles regulatory citations, drug names, SDK terms, internal acronyms. Auditors testing the caption-track against the screen will catch the mangling. Fix: replace the auto-transcript with a clean glossary-biased WebVTT.
  2. Teams meeting recording captions inherited from auto-transcript. Teams meeting recordings auto-save with the auto-generated transcript as the caption track. Promoting the recording to a training catalogue without correction means the auto-transcript ships as the caption track to learners. Fix: catalogue audit step that flags recordings promoted from Teams without caption-track replacement.
  3. Stream-classic legacy un-migrated. Stream-classic content with caption tracks loses caption-track linkage on migration to Stream-on-SharePoint without explicit re-attachment. Some tenants have un-migrated Stream-classic catalogue with broken caption tracks post-migration. Fix: migration audit, re-upload caption tracks per video.
  4. Multi-language caption tracks missing on multinational deployments. A multinational M365 tenant's training catalogue often retains a single English caption track on video that's surfaced to non-English regions; the multi-language caption-track support exists but is rarely populated. WCAG SC 1.2.2 doesn't require multi-language but EU member-state regulators sometimes do. Fix: per-language caption-track delivery as part of the regional deployment.
  5. SharePoint video page caption-track misalignment. SharePoint pages embed video by reference to the source MP4. Updating the caption track on the source MP4 propagates to the page; replacing the source MP4 sometimes orphans the caption track without notice. Fix: per-page verification step in the catalogue audit; document the source-MP4-to-page binding in the captioning-provenance log.

The glossary-biased workflow for M365-tenant video

  1. Pull the customer's controlled vocabulary. Internal product / programme / SKU registers, SDK / engineering vocabulary, customer / partner / supplier registers (carefully — privacy-bound), regulatory-citation registers, healthcare / financial-services vocabulary as applicable. The customer's controlled vocabulary is the highest-leverage glossary input.
  2. Operate inside the tenant boundary. For Highly-Confidential or controlled-information video, the captioning workflow runs inside the customer's tenant — Microsoft Graph API integration, vendor-supplied tooling that respects sensitivity labels and DLP policy. For lower-sensitivity video, the workflow can run with vendor-external processing under DPA-controlled data flow.
  3. Replace the auto-transcript with glossary-biased WebVTT. Process the audio through the glossary-biased captioning pipeline; deliver clean WebVTT; upload to the video file's caption-and-transcript surface, replacing the auto-transcript-derived caption track.
  4. Multi-language pass. For multinational deployments, a per-language WebVTT track per video. Per-language glossaries handle regional vocabulary differences.
  5. SME / clinical / engineering reviewer pass. Domain-expert review of every glossary-applied term in context. The amber-highlight UI shows source-line provenance.
  6. Per-video verification. Open the video in the Stream player; verify the caption track renders, the language tag is correct, and the closed-caption toggle is exposed. For SharePoint-page-embedded video, verify on the page as well.
  7. Document captioning provenance per video. Caption source, glossary version, reviewer, review date, glossary term count, video file location (OneDrive vs SharePoint vs SharePoint-embed), data-residency confirmation, sensitivity-label compatibility — eight fields per video. Lives in the SharePoint document library's column metadata for SharePoint-hosted video, in OneDrive's metadata for OneDrive-hosted video, or in a separate captioning-provenance list in SharePoint.

See pricing

Stream-specific captioning RFP questions

Procurement teams running a captioning RFP for an M365-Stream-hosted training catalogue will want to ask several Stream-specific questions. From our captioning RFP template:

How M365 Stream captions intersect Section 508, ADA Title II, EAA, and OCR HIPAA

M365-Stream-hosted training catalogues face several accessibility regimes:

The technical caption requirement at WCAG SC 1.2.2 is consistent across regimes; M365 Stream's caption-track support is feature-complete — the failure mode is operational (auto-transcript left in place, multi-language tracks missing, Stream-classic legacy un-migrated), not platform-capability. The captioning-provenance log per video is the audit-evidence shape; data-residency confirmation is the M365-specific add.

Related questions

Stream-on-SharePoint vs Stream-classic — which one is the catalogue on today?

Modern M365 tenants are entirely on Stream-on-SharePoint following Microsoft's Stream-classic retirement. Some tenants retain Stream-classic legacy URLs for un-migrated content; these need explicit migration. The Stream admin centre (or the equivalent Microsoft 365 admin centre view) shows the migration status. The captioning concerns differ slightly between the two — Stream-classic had its own caption-upload mechanism with separate admin policies; Stream-on-SharePoint inherits the SharePoint document library's metadata and tenant policies.

Does the Microsoft 365 auto-transcript respect industry-specific vocabulary?

Microsoft's speech-to-text has improved meaningfully with custom-vocabulary support in some surfaces (Azure Speech Studio for tenants that build their own integration), but the in-Stream auto-transcript runs against the generic model — the same generic-ASR limitation as every other auto-transcript. For regulatory-citation-dense content, internal-product vocabulary, drug names, and SDK terms, the auto-transcript mangles consistently. The path that holds up at audit is upstream glossary-biased captioning with the clean WebVTT replacing the auto-transcript-derived caption track.

Can the captioning workflow be triggered automatically from Teams meeting recordings?

Microsoft Graph API supports event subscriptions on OneDrive / SharePoint that fire when a new MP4 lands; a captioning workflow can subscribe to these events and process new recordings as they arrive. The integration is tenant-specific — admin must consent to the Graph API permissions, and the data-flow has to respect tenant policy (sensitivity labels, DLP, EU Data Boundary). For high-volume Teams recording catalogues, this automation is the operational answer.

What about Microsoft Stream Live Events?

Stream Live Events (and the Teams Live Events successor) provide live captioning during the event. Live captioning is out of scope for GlossCap v1; GlossCap focuses on prerecorded captioning. The recording of a Live Event becomes a prerecorded MP4 in OneDrive / SharePoint after the event ends, at which point the standard prerecorded-captioning workflow applies. The live-caption transcript can serve as a starting point, with the same correction discipline as Teams meeting auto-transcripts.

How does data-residency work for the captioning pipeline?

EU Data Boundary M365 tenants commit to keeping customer data in the EU. Captioning vendors that process audio in US-only or globally-distributed infrastructure break this commitment. For EU Data Boundary compatibility, the captioning pipeline runs in EU-resident infrastructure (or runs inside the customer's tenant via Graph API integration). The vendor's data-flow documentation needs to demonstrate region-bound processing. For Multi-Geo deployments (M365 tenants with multiple data-regions), the captioning pipeline routes per-video to the originating region.

What about retention-policy interaction with the captioning workflow?

Tenants with short retention on Teams meeting recordings (30 days, 90 days) need the captioning workflow to fit inside the retention window. Real-time-on-arrival processing is the canonical answer — caption tracks are added to the recording shortly after it lands, before any retention deletion fires. For high-retention content (training catalogues with no retention deletion), the workflow can run on demand.

Further reading