Platform reference · Microsoft Stream

Microsoft Stream captions: Stream-on-SharePoint, Teams meeting recordings, M365 tenant captioning

Microsoft Stream is the Microsoft 365 native video surface — the place where Teams meeting recordings land, where OneDrive videos live, where SharePoint video pages source their content, and where the Stream-classic legacy footprint still hosts a long tail of pre-2021 organisational video. Microsoft re-architected Stream as Stream-on-SharePoint in 2021-2022, retiring Stream-classic with extended migration windows; modern M365 tenants now host video as files in OneDrive (personal video) and SharePoint (organisational video), with the Stream-on-SharePoint player surfacing the content. Where Vimeo and Wistia are the SMB and B2B-SaaS-focused video hosts respectively, Stream is the surface every Microsoft-365-running organisation already has — which makes it the dominant default for internal training video at the 50-to-50,000-employee enterprise tenants where M365 is the corporate productivity stack. The captioning surface is conventional (WebVTT sidecar caption track per video, transcript edit-and-replace inside the Stream player, integration with Teams meeting auto-transcript) but with M365-tenant-specific data-residency, retention-policy, and tenant-policy concerns. Glossary-biased upstream captioning is what produces caption tracks clean enough to satisfy the audit lens that comes with M365-deployed mandatory training.

TL;DR

Microsoft Stream-on-SharePoint stores video as MP4 files in SharePoint document libraries (organisational video) and OneDrive (personal video). The captioning model is: (1) WebVTT sidecar caption track per video file, attached via the video file's caption-and-transcript-management surface; (2) auto-generated transcript from speech-to-text run by the M365 tenant on upload (subject to admin policy and licensing tier); (3) edit-the-transcript and re-attach as the canonical caption track. Teams meeting recordings auto-land in OneDrive (1:1 / small meetings) or SharePoint (channel meetings), inheriting the same caption-and-transcript model. The auto-transcript has the same generic-ASR limitation as every other auto-transcript — it mangles SDK names, drug names, regulatory citations, internal acronyms, customer names, supplier names, internal-product names. Replacing the auto-transcript with a clean glossary-biased WebVTT track is the path to caption tracks that hold up at audit. The M365-tenant-specific concerns are data residency (M365 multi-geo / EU Data Boundary), retention policy (tenant Information Protection labelling), tenant-policy (Stream-allowed sites, sharing-controls, external-sharing-disabled scenarios), and the Stream-classic-to-Stream-on-SharePoint migration legacy. Loom and Stream are the two async-video defaults at 50-500-employee SaaS — Loom by adoption, Stream by tenant-default.

What Microsoft Stream is, and where in the workflow captioning lands

Microsoft Stream's current architecture (Stream-on-SharePoint) treats video as a first-class file type stored in OneDrive and SharePoint, with the Stream player surfacing playback, transcription, captions, chapters, and forms-and-comments behaviour. The captioning-relevant characteristics:

Stream-on-SharePoint vs Stream-classic. Stream-classic was a separate Azure-hosted video service with its own URL space and admin policies. Microsoft retired Stream-classic in stages; modern tenants are entirely on Stream-on-SharePoint. Some tenants retain Stream-classic legacy URLs for un-migrated content. The captioning surface differs slightly between the two; this page covers Stream-on-SharePoint primarily, with Stream-classic notes where relevant.
Video-as-file storage model. Video is an MP4 file in OneDrive (personal) or SharePoint (organisational). Caption tracks are sidecar WebVTT files associated with the video file's metadata.
Stream player web component. Surfaces the video plus the caption track plus the auto-transcript plus chapters plus comments-and-forms. Caption-toggle is exposed to the viewer.
Auto-transcript via Microsoft Speech-to-Text. Tenant policy controls whether transcript generation runs on upload. The auto-transcript can be promoted to the caption track or kept separate. Same generic-ASR limitation as everywhere else.
Teams meeting recording integration. Teams meeting recordings (channel meetings, scheduled meetings, ad-hoc meetings) auto-save to SharePoint or OneDrive based on meeting type, inheriting the Stream-on-SharePoint caption-and-transcript model.
SharePoint video pages. Organisational training pages embed the Stream player; the caption-track behaviour is the same as the underlying video file.
Tenant-policy controls. M365 admins control Stream-allowed sites, external-sharing of video, transcript-generation policy, retention-policy on video files, sensitivity labels, and DLP policies. Captioning workflows have to coexist with tenant policy.
Stream-classic legacy. Some tenants retain Stream-classic URLs for un-migrated content. Caption tracks on Stream-classic videos use the original Stream-classic caption-upload mechanism and don't survive the migration to Stream-on-SharePoint without explicit re-attachment.

Captioning lands at four points: (1) MP4 files in OneDrive / SharePoint via the video file's caption-and-transcript surface; (2) Teams meeting recordings (which are MP4s in OneDrive / SharePoint after the meeting); (3) embedded video on SharePoint pages (which inherits the underlying file's caption track); (4) Stream-classic legacy content for tenants with un-migrated catalogue.

The Stream caption-upload mechanic

Open the video file in Stream-on-SharePoint. From OneDrive or SharePoint, open the MP4. The Stream player surfaces a transcript pane and caption controls.
Auto-transcript generation. If tenant policy permits, the Stream player offers Generate Transcript. Microsoft's speech-to-text runs server-side; the transcript appears in the transcript pane within minutes for a typical video. The auto-transcript can be promoted to the caption track via the transcript pane's settings.
Edit the transcript. The transcript pane supports per-line edit; the edited transcript can be re-promoted to the caption track. This is the in-product correction surface — useful for small fixes, slow for catalogue retrofits.
WebVTT sidecar upload. The video file's caption-and-transcript management surface accepts WebVTT upload; the uploaded VTT replaces or supplements the auto-transcript-derived caption track. This is the path the glossary-biased workflow uses — clean WebVTT delivered upstream, uploaded to the video file.
Multi-language caption tracks. Microsoft supports multiple caption tracks per video file with language tagging. Multi-language deployments add per-language WebVTT files to the video.
Teams meeting recording transcripts. The Teams meeting auto-transcript writes through to the Stream-on-SharePoint transcript pane on the saved recording. The same edit-or-replace path applies.
SharePoint Syntex and Premium video features. Higher-tier M365 SKUs (E5, plus add-ons) expose additional video-intelligence features (chapters, smart-search, key-moments). The caption-track surface is the same; the intelligence features benefit from clean caption-track input.

The vocabulary surface in M365-tenant video

M365-tenant video carries the customer's full operational vocabulary because it's the inside-the-tenant communication surface. Common patterns:

Internal product, programme, and SKU names. The customer's full internal product vocabulary appears in product training, customer-success calls, partner enablement. Generic ASR mangles every internal product name.
SDK and engineering vocabulary. For engineering organisations, internal-tooling names, library names, command-line invocations, infrastructure-component names, deployment-environment names. See engineering onboarding captions.
Customer / partner / supplier names. Sales-pipeline-review meetings, customer-success kick-off recordings, partner-channel webinars all contain customer / partner / supplier names. The privacy posture for customer-name-bearing audio is a material concern (see Privacy and data-residency below).
Internal-acronym register. Programme names (OKR registers, division codes, system names), role abbreviations, internal-policy citation registers, M365 sensitivity-label names.
Regulatory-citation surface. Compliance training that lands in Stream carries the customer's regulatory-citation register. See compliance training captions, safety training captions, HIPAA training captions.
Healthcare / EHR vocabulary. Healthcare M365 tenants store training video on M365 Stream; the EHR / drug-name / procedure-code vocabulary surface applies. See medical training captions.
Financial-services vocabulary. Banking, insurance, asset-management M365 tenants carry the FINRA / SEC / OCC / FDIC / EU MiFID II vocabulary surface.
Multi-language video. Multinational M365 tenants carry video in multiple languages; per-language caption tracks plus per-language glossaries for the customer's regional vocabulary.

Privacy, data residency, and tenant-policy concerns

M365-tenant video sits inside the customer's tenant boundary; the captioning workflow must respect that boundary. Common concerns:

EU Data Boundary and Multi-Geo. EU-data-resident M365 tenants run on the EU Data Boundary commitment. Video, transcripts, and caption tracks must remain in the EU. Captioning vendors that route audio outside the customer's data-region break the commitment.
Tenant-policy on transcript generation. Some tenants disable the auto-transcript feature (regulated industries, controlled-information tenants). The captioning workflow has to operate without the auto-transcript starting point — clean WebVTT delivered upstream is the entry point.
Sensitivity labels and DLP. Microsoft Information Protection sensitivity labels (Confidential, Highly Confidential, customer-defined) inherit to the video file and govern external-sharing behaviour. The captioning vendor's data-flow has to fit the sensitivity-label constraint — for Highly-Confidential video, the captioning workflow runs inside the customer's tenant, not in a vendor-external SaaS.
External-sharing-disabled tenants. M365 tenants that disable external-sharing entirely (federal-contractor, financial-services, defence) cannot share video to a vendor. The captioning workflow either runs inside the tenant via Microsoft Graph API integration with vendor-supplied tooling, or the customer extracts audio for offline processing under the tenant's own DPA-controlled flow.
Retention policy. Some tenants apply short retention on Teams meeting recordings (90 days, 30 days). The captioning workflow has to respect retention windows; long-running corrections aren't feasible on short-retention content.
Customer / supplier names in audio. Sales-pipeline meetings and customer-success kick-offs contain customer names — sometimes high-sensitivity customer names. The captioning vendor's privacy posture (DPA, sub-processor list, deletion timeline) is part of the procurement decision.
BAA scope. For healthcare M365 tenants, the captioning vendor needs to be in BAA scope where PHI-relevant content lands in caption tracks (training is generally not PHI in normal operation, but care must be taken with patient-facing demonstration video).

The Stream-specific failure modes

The five caption-related findings most likely to surface during an OFCCP audit, an EAA inspection, an OCR HIPAA workforce-training file review, or an internal accessibility-self-audit on an M365-Stream-hosted catalogue:

Auto-transcript left as the caption track on regulatory-citation-dense content. The auto-transcript mangles regulatory citations, drug names, SDK terms, internal acronyms. Auditors testing the caption-track against the screen will catch the mangling. Fix: replace the auto-transcript with a clean glossary-biased WebVTT.
Teams meeting recording captions inherited from auto-transcript. Teams meeting recordings auto-save with the auto-generated transcript as the caption track. Promoting the recording to a training catalogue without correction means the auto-transcript ships as the caption track to learners. Fix: catalogue audit step that flags recordings promoted from Teams without caption-track replacement.
Stream-classic legacy un-migrated. Stream-classic content with caption tracks loses caption-track linkage on migration to Stream-on-SharePoint without explicit re-attachment. Some tenants have un-migrated Stream-classic catalogue with broken caption tracks post-migration. Fix: migration audit, re-upload caption tracks per video.
Multi-language caption tracks missing on multinational deployments. A multinational M365 tenant's training catalogue often retains a single English caption track on video that's surfaced to non-English regions; the multi-language caption-track support exists but is rarely populated. WCAG SC 1.2.2 doesn't require multi-language but EU member-state regulators sometimes do. Fix: per-language caption-track delivery as part of the regional deployment.
SharePoint video page caption-track misalignment. SharePoint pages embed video by reference to the source MP4. Updating the caption track on the source MP4 propagates to the page; replacing the source MP4 sometimes orphans the caption track without notice. Fix: per-page verification step in the catalogue audit; document the source-MP4-to-page binding in the captioning-provenance log.

The glossary-biased workflow for M365-tenant video

Pull the customer's controlled vocabulary. Internal product / programme / SKU registers, SDK / engineering vocabulary, customer / partner / supplier registers (carefully — privacy-bound), regulatory-citation registers, healthcare / financial-services vocabulary as applicable. The customer's controlled vocabulary is the highest-leverage glossary input.
Operate inside the tenant boundary. For Highly-Confidential or controlled-information video, the captioning workflow runs inside the customer's tenant — Microsoft Graph API integration, vendor-supplied tooling that respects sensitivity labels and DLP policy. For lower-sensitivity video, the workflow can run with vendor-external processing under DPA-controlled data flow.
Replace the auto-transcript with glossary-biased WebVTT. Process the audio through the glossary-biased captioning pipeline; deliver clean WebVTT; upload to the video file's caption-and-transcript surface, replacing the auto-transcript-derived caption track.
Multi-language pass. For multinational deployments, a per-language WebVTT track per video. Per-language glossaries handle regional vocabulary differences.
SME / clinical / engineering reviewer pass. Domain-expert review of every glossary-applied term in context. The amber-highlight UI shows source-line provenance.
Per-video verification. Open the video in the Stream player; verify the caption track renders, the language tag is correct, and the closed-caption toggle is exposed. For SharePoint-page-embedded video, verify on the page as well.
Document captioning provenance per video. Caption source, glossary version, reviewer, review date, glossary term count, video file location (OneDrive vs SharePoint vs SharePoint-embed), data-residency confirmation, sensitivity-label compatibility — eight fields per video. Lives in the SharePoint document library's column metadata for SharePoint-hosted video, in OneDrive's metadata for OneDrive-hosted video, or in a separate captioning-provenance list in SharePoint.

See pricing

Stream-specific captioning RFP questions

Procurement teams running a captioning RFP for an M365-Stream-hosted training catalogue will want to ask several Stream-specific questions. From our captioning RFP template:

WebVTT compatibility with Stream-on-SharePoint caption upload. The vendor's caption-file output should upload cleanly into the Stream-on-SharePoint caption-and-transcript surface as WebVTT.
EU Data Boundary / Multi-Geo compatibility. Does the vendor's captioning pipeline keep audio and caption tracks within the customer's data-region? Vendors that route through US-only infrastructure are not compatible with EU Data Boundary tenants.
Sensitivity-label and DLP-policy compatibility. Does the vendor's data-flow respect Microsoft Information Protection sensitivity labels and DLP policies? For Highly-Confidential video, can the workflow run inside the customer's tenant via Graph API rather than via vendor-external processing?
Teams meeting recording integration. Does the vendor's pipeline integrate with Teams meeting recordings as a source — replacing the auto-transcript on a recording catalogue at scale?
Stream-classic-to-Stream-on-SharePoint migration support. For tenants with un-migrated Stream-classic legacy, does the vendor support re-attaching caption tracks to migrated MP4 files at scale?
Multi-language caption-track delivery. Does the vendor support per-language WebVTT delivery for multinational deployments?
BAA / DPA / sub-processor list. Standard data-handling-posture questions, with M365-tenant-specific framings on data-residency.

How M365 Stream captions intersect Section 508, ADA Title II, EAA, and OCR HIPAA

M365-Stream-hosted training catalogues face several accessibility regimes:

Section 508 — federal-contractor M365 tenants face the WCAG 2.0 AA technical bar. The Stream player and caption-track behaviour satisfy the technical requirement; the procurement evidence is the captioning-provenance log per video.
Section 504 — federal-financial-assistance-recipient M365 tenants face the functional-access standard.
ADA Title II — state and local government M365 tenants (state-employee training, county-government training, public-university HR training) carry the 2026-04-24 WCAG 2.1 AA bar.
ADA Title III — private-sector M365 tenants face the indirect technical bar through case-law evolution.
European Accessibility Act — EU-operating M365 tenants in scope (B2C surfaces) face EN 301 549 / WCAG 2.1 AA. EU Data Boundary compatibility is the data-residency overlay. See our EAA Q3 2026 enforcement landscape post.
AODA — Ontario-operating M365 tenants face IASR § 14 WCAG 2.0 AA.
OCR HIPAA workforce-training file review — see HIPAA training captions.
OSHA / MSHA / safety-training — see safety training captions.
Joint Commission triennial survey — for healthcare M365 tenants. See the Joint Commission survey-prep playbook and Healthstream captions for the parallel LMS-side workflow.

The technical caption requirement at WCAG SC 1.2.2 is consistent across regimes; M365 Stream's caption-track support is feature-complete — the failure mode is operational (auto-transcript left in place, multi-language tracks missing, Stream-classic legacy un-migrated), not platform-capability. The captioning-provenance log per video is the audit-evidence shape; data-residency confirmation is the M365-specific add.